Filter
From Piki
A Synapse filter is a processing element that operates on a data unit. It is used for preprocessing data.
Contents |
General
Filters are an integral part of most systems as they allow for filtering, formatting and reshaping of the data before it is used in a model. Like most other Synapse components it consists of two levels: one base level that performs the calculations and a GUI level that shows a presentable interface to the user. If a filter is deployed the base level is used. The GUI level contains a settings object used by the settings browser and icons used by the filter stack and bar (see below).
Preprocessing GUI
Filter stack
A filter is attached to a data unit's filter stack. The stack is interactive meaning that filters can be added, removed, modified and reordered. It is operates in a non-destructive way on the data. The data unit always contains the original data in its input buffer. The filters then operate in the order they are on the stack on the output buffer. In the first iteration the output buffer is just a copy of the input buffer but then the filters are applied in order.
In order to avoid having to calculate all the filters every time a change is made to the stack, filters have meta-data that in broad terms describe the operation of the filter.
The GUI representation of the filter stack can be found in preprocessing mode. The GUI shows the filter stack of the selected data unit. Filters are added by drag-drop from the filter bar. Filters can be removed by selecting them and pressing the DELTE key. They can be reordered by dragging them into the desired position. The stack is calculated when the "Apply" button is pressed. Until then the filters just use the meta-data. In order to view (in a visualizer or the statistic pane the changes made by a filter must be submitted by pressing the "Apply" button.
When you select a filter in the GUI, it's settings are shown in the Settings browser where they can be modified. To disable a filter uncheck the checkbox next to the filter icon in the stack. This will keep the filter object with all settings but it will not be used in the calculations.
Filter bar
The filter bar is the GUI component that contains all available filter components. The filters are divided into categories. Adding a filter is done by drag-dropping a filter from the bar on to the filter stack. When you hover your mouse pointer over a filter you get a short description of the filter's operation.
The filters in the bar can be reordered and moved into other categories. The latter is not generally recommended as the categories are generally sorted by
Filtering operation
While filters can do almost anything to the data there are a few broad categories that one should be aware of as they may have consequences for the system as a whole.
Per sample filtering
This is the most flexible type of filter where a certain operation is performed for each sample. They can be used in deployed systems and can be used in both static and dynamic models. This can include filters that collect information from multiple samples but don't need to update that information dynamically. Examples of per value filters are the expression filter and the normalizer filter.
Per feature filtering
This type of filter uses information from an entire feature (or multiple features) to modify the samples. The samples in the affected feature are potentially all subjects to change. There are many filters of this type that are very useful - filters like the outlier filter and the equalizer filter. They are however unsuitable for deployment as they can't operate on individual samples. They can also often depend on the number of samples used. For instance an equalizer filter will produce different results if 2 or 200 samples are used.
The per feature filtering operations are useful for re-shaping the data in a form suitable for training. Usually they are not applied on the validation set unless special care is taken.
Time series filtering
Most time series filters are a combination of the per sample filtering and the per feature filtering paradigms. Since a time series can never consist of one sample, the deployment requirements are changed. It is possible to have standard per feature filtering (such as the DFT filter) which can essentially only be used for visualization purposes in preprocessing.
One important thing to think about when filtering time series is causality. A causal filter uses for each sample in the time series only historic data. An anti-causal filter uses future samples making it unsuitable for any type of prediction or system identification task. Examples of causal filters are the Kalman filter and the moving average filter. An example of an anti-causal filter is the Wavelet filter.
Index based filtering
Some visualizers can create a special type of filters that should be used with great caution. They are index based filters or in other words filters that use absolute sample indexes to select which sample to operate on. Typically they deal with removing or adding samples.
An index based filter is marked in the filter stack as "FilterFrame". Visualizers that can create index based filters are the scatter plot, the Grid View, the Value vs Sample visualizer and the histogram visualizer.
The problem with index based filters are that they are incompatible with any changes in data. An index based filter that has instructions to remove sample 1,2 and 5 will always remove sample 1,2 and 5 regardless of the data found at those sample indices. Such filters should be seen primarily as visualization aids and in some limited cases as preprocessing of the training set. They should be avoided when possible. They are not deployable.
Although currently several visualizers use them, they are being phased out and will in the future be replaced with query based filters that operate on ranges of values.
Meta filtering
Meta-filters are a special type of filter that doesn't operate on the data, but on the structure of the data. Tasks for a meta-filter may be removing a feature or rearranging the order of the features in a data unit. Examples of meta filters are the extract filter, the reorder filter and the rename filter.
Script based filter
Main article: Synapse:Script filter
The most powerful filter in Synapse that can perform any of the above filtering types is the Synapse:Script filter. It allows the user to write a filter in C# directly from the environment.
General advice
- When filtering time series, remember that the validation set's logical position is after the training set. When a system is trained the control system shows system the training set and then the validation set. Unless they are treated like a continuous entity in preprocessing your validation results will be incorrect.
See also
- List of Filter components - List of all Synapse filter components.
