SOM View visualizer
|Can modify data||Yes|
|Can show multivariate data||Yes|
|Supports clipboard operations||No|
|Can export data||No|
|Can export images||No|
The SOM View is based on hexagonal self-organizing maps (SOMs).
One important difference between the SOM View and static visualizers such as the Grid view is that the SOM View is an adaptive component - it needs to be trained. This is accomplished by pressing the "Train" button in the toolbar.
Apart from the toolbar and the status bar, the SOM View consists of a number of regions called "Maplets". There are two special maplets, "Clusters" and "Unified distance matrix". The other maplets represent the features in the data - there is one maplet per feature. The "Clusters" maplet shows the automatic clustering of the data. The "Unified Distance Matrix" shows the average distance between the nodes in the SOM.
Each hexagonal cell in a maplet represents a SOM node. Each SOM node is associated with a number of data points to which it is close(st) to in feature space. Each node has the same dimension as feature space. If the data has three coordinates (say X,Y,Z) then the node will also have three coordinates. The component plane maplets show the values of the nodes at each dimension.
When you select nodes in one maplet you see the selection across the board. For a practical example let's take a look at the cow data set covered in tutorial 4.
We are looking here at three variables from the data set (number of cows, number of churches and number of schools):
The three maplets have the same topological mapping so a node (and implicitly a group of points) in one maplet has the same position in the others. So if we for instance select the red area in the "Cows" maplet - the area where the "Cows" variable has a high value we get this selection across the board:
The same nodes are selected and hence the same data points. What we can directly conclude with just a glance at the three maplets is that where there are a lot of cows there are also a lot of churches but few schools.
A SOM gives you literally a map of your data which you can use to get a fast understanding of how it all fits together.
Maplet GUI interaction
The basic interaction is as follows: Left-click or left-drag (press left mouse button and drag) on the SOM nodes selects nodes. Right-click or right-drag on the maplet nodes unselect nodes. The same principle applies to the maplet spectrum. Left-drag on the spectrum selects the range between where you pressed the mouse button and released it. Right-drag to unselect ranges.
For instance in the example maplet above if you wanted to select nodes where HousePrice has high values, you would click on the spectrum somewhere right to the yellow region and drag to the right. You would release the button at the end of the spectrum range. This would select all the nodes that have colors between yellow and red - i.e where the HousePrice variable has high values.
The selection will be reflected in the other maplets. The selection is the same type as any other visualizer selection and you can view it in any other visualizer - including a new SOM View (allowing for hierarchical data exploration).
The number of data points associated with a node is shown as a dot in its hexagon. The size of the dot is approximately proportional to the number of data points associated with the node in question.
For customization of the maplet gui, see the section called "Customizing the GUI".
Control and Customization
SOM control and customization
To train the SOM, press the "Train" button. To change the dimensions of the SOM (default is 15x15) change the settings in the toolbar. After you have changed the SOM size, you will need to re-train it. The SOM algorithm can be fine tuned by changing the parameters found under "Options->Advanced Settings". For each setting there is a brief description of what the parameter does.
To select which features should be included in training select "Options->Feature Configuration". Here you can also set a multiplier per feature. The multiplier sets preferences for the SOM relative the features i.e. a higher multiplier means that the feature is considered more important. Thus the SOM grid will be more sorted according to that feature. If you uncheck the "Use" checkbox, the feature won't be used for training at all. Note that the feature will still be visualized with a maplet.
Clustering control and customization
The SOM View component provides automatic clustering through the use of a Neural gas applied on the trained SOM. When the SOM is trained an automatic clustering follows. The found clusters are shown in the "Clusters" maplet.
You can manually override the number of clusters by changing the number in the "Clusters" textbox in the toolbar and pressing enter. To do a new fully automatic clustering, press the "Auto" button.
The clustering algorithm can be fine tuned by changing the parameters found under "Options->Advanced Settings". For each setting there is a brief description of what the parameter does.
Under "Options->Advanced Settings" you can find various parameters for customizing the GUI.
You can select which types of maplets are to be shown and what color maps are to be used. You can also set the size of the maplet elements as well as limit the total number of maplets to be shown in the visualizer.
Creating a custom filter
You can permanently keep the results of a clustering by creating a filter that is added to the stack. This filter will add a set of new features to the data unit. These features show per sample which cluster the data sample belongs to.
To create a filter select "Options->Apply as Filter". Note that while recursive and hierarchical use is possible, it can lead to meaningless results.