Principal component analysis
From Piki
Principal Component Analysis (PCA) is a linear transformation that can be used to reduce, compress or simplify a data set. It does this by transforming the data to a coordinate system so that the greatest variance of the data by a projection of the data ends up on the first component (coordinate), the next one in line on the magnitude of variance ends up on the second component and so on. This way one can choose not to use all the components and still capture the most important part of the data.
Contents |
Basic concept
To understand what this means, we can take a look at a 2D example. Suppose we have some X-Y data that looks something like this:
| Sample 2D data |
|---|
|
To see how the data is spread, we encapsulate the data set inside an ellipse and take a look at the major and minor axes that form the vectors P1 and P2
| Ellipse fitted to the data | Ellipse axes |
|---|---|
|
|
These are the principal component axes - the base vectors that are ordered by the variance of the data. PCA finds these vectors for you and gives you a [X,Y] -> [P1, P2] transformation. While this example was for 2D, PCA works for N-dimensional data, and it is with high dimensionality problems it is generally used.
Why PCA?
Most of the problems when it comes to data to be used for some form of statistical analysis can be reduced to two cases: too few samples or too many features. PCA helps with the latter. Having too many features often results in the problem having too many degrees of freedom leading to poor statistical coverage and thus poor generalization. In addition each feature adds to a computational burden in terms of processing and storage.
Shouldn’t a supervised neural network be able to do PCA on its own? Yes, they can, but there are three good reason why we should avoiding dumping that task on a neural net:
- Increased degrees of freedom (features) drastically increases the search space that the adaptation algorithm has to cover.
- Increased degrees of freedom increases the complexity of the search space. A more complex search space results in a larger number of local minima for the optimization to get stuck in and therefore to give suboptimal solutions.
- A neural net will for most problems do something similar to PCA. Taking away that task will allow the neural net to do the things that can’t be done with PCA and thus its computational power is used in a better way.
Limitations of PCA
There are however some limitations with PCA that we should take into consideration. First of all it’s a linear method. Basically the problem involves rotating the ellipsoid that we saw earlier in such a way that the direction of the variance of the data comes as the first component. Simplified, PCA does basically this:
| Basic PCA transformation |
|---|
|
Now this works fine as long as the X/Y relation is fairly linear. If we have a situation like this, we have a problem:
| Non-linear problem |
|---|
|
While the PCA still tries to produce components by variance, it fails as the largest variance is not along a single vector, but along a non-linear path. Neural networks on the other hand are perfectly capable of dealing with nonlinear problems and can on their own do this. In addition, they can do scaling directly so that the principal components can be scaled by their importance (eigenvalues):
| Neural network result |
|---|
|
In this case [X,Y] -> [g(P1), f(P2)] is a nonlinear transformation. So while PCA in theory is an optimal linear feature extractor, it can be bad for non-linear problems.
See also
- Generalized Hebbian Algorithm (GHA) - Hebbian type learning that performs PCA.
- The talented Dr.Hebb Part 2 - Blog tutorial on the subject.






