Logo name

Outlier filter

From Piki

  • Currently0.00/5
Jump to: navigation, search

The outlier remover filter eliminates outliers from data.

Oultier remover
Image:Outlier filter icon.jpg
NameOultier remover
DeployableNo
StaticYes
Sample modifierYes
Feature modifierNo

Contents

Usage

The outlier remover is used to purge the data of samples that are outside the range of normal values. This filter can only be used with static systems and should not be used with time series data.

Operation

The outlier remover has two basic operational modes: statistical and histogram modes. The statistical (or sigma) mode removes samples that fall outside of a normal distribution. This works very well when the data roughly follows a bell shaped distribution.

Image:Outlier remover sigma mode.jpg

The histogram mode removes samples that occur less than a specified frequency in a histogram of the features of the data unit are removed. This will work with any distribution but is sensitive to the histogram settings.

Image:Outlier remover histogram mode.jpg

The two modes can be combined.

Settings

The settings can be modified using the settings browser.


Outlier remover settings


  • (Mode)
  • Mode: Operation mode:
  • Sigma - Uses the statistical mode for removal.
  • Histogram - Uses the histogram mode for removal.
  • Full - Uses both modes for removal


  • (Operation)


  • (Operator)
  • Invert: Flips the selection status of all features.
  • Clear: Clears the selection status of all features (sets them to false)


  • (Settings)
  • (If Sigma or Full) xSigma: Sigma multiplier for normal distributions. Beyond the sigma value samples are removed. Sigma = 2 equals about 96% of a normal distribution.
  • (If Histogram or Full) Bins: Number of bins in histogram. Higher number of bins results in higher resolution, but also fewer samples per bin.
  • (If Histogram or Full) Cutoff: Cutoff level. Bins that have fewer samples than the cutoff level are removed.


  • (Selected)
  • Show Count: The number of features to show in the list. Useful if there is a very large number of features.
  • Show From: The feature number to start the list from. Only used if the number of features > show count.
  • <Feature name>: The selection status of a feature. If this is set to true the feature will be affected by the filter.


General advice

  • Be careful not to remove too much with the outlier remover. It is advisable to keep the validation set intact.

See also

This page was last modified 22:15, 16 January 2008.  This page has been accessed 2,223 times.  Disclaimers