Histogram visualizer
From Piki
The Histogram visualizer shows the distribution of a feature over its range. This is done by dividing the range into a number of bins and calculate how many data points fall within the boundaries of each bin.
| Histogram | |
| Name | Histogram |
| Can modify data | Yes |
| Produces filters | Yes |
| Can show multivariate data | No |
| Supports selection | Yes |
| Supports clipboard operations | Yes |
| Can export data | no |
| Can export images | Yes |
Contents |
Operation
This visualizer show the histogram of one feature at the time. To select which feature use the Feature drop-down in the visualizer menu. You can change the number of bins used (displayed as vertical bars) by changing the number in the Bins box also found in the visualizer menu.
The histogram helps in understanding what kind of feature we are dealing with, and what values we can expect this feature to have. We can also detect a kind of erroneous data called outliers as single data points laying very far away from the the rest of the data. Other groups of data might form lumps a little off to one side. Such lumps can be of special significance and worth having a further look at.
If a feature can be seen as a random variable, and enough data is used and the bins are narrow enough, the look of the histogram will approach that of the probability density function for the probability distribution associated with that variable.
By default the probability density function of a Gaussian is drawn on top of the histogram along with a vertical line at the mean value. This does not mean the feature actually has this distribution, but is their as reference. The overlay can be deactivated in the Options menu.
The Options menu offer the following:
- Distribution Overlays - Display the probability density function of a distribution fitted to the data.
- Gaussian - displays the probability density function for a gaussian and the mean value over the histogram.
- Make Filter - If you have selected any bins you can choose between removing any samples that falls within them, or removing all other samples, and apply this removal as a filter on the filter stack of the data unit being displayed by the visualizer.
To see the coordinates of the mouse position, right-click on the plot and choose Show World Coordinates to toggle their display.
Zooming and Panning
To zoom to a specific region, use the magnifying glass tool
. If you just want to stretch or tighten up one axis you can do so by dragging at the axis just outside the plot area (similar to the same action depicted in the Value vs. Sample visualizer). To pan, use the hand tool
and to get back to the default scales and positions of the axis, right-click and select Original Dimensions.
Selection
The histogram visualizer will let you select bins, and with them any data that falls within them. Selected bins are colored yellow. To select a bin, use the selection tool.
(For a bin to be selected, the selection must reach across its center.) To add to a selection press [Shift] and to subtract from a selection press [Alt]. To select everything or to clear the selection, use the two buttons in the visualizer menu (
).
Clipboard, Save and Print
Their is a dedicated menu for these tasks featuring a button for each action.
The Save button will save the plot as a any of a number of image formats. Print will let you print out an image of the plot and Copy will copy the data in the selected bins to the Windows clipboard in CSV text format. When data is copied to the clipboard, all features will be copied and not just the one displayed in the plot.
