on
Explain PCA and 2 ways to calculate it
There are lots resources about Principal Component Analysis (PCA) yet it is one such abstract knowledge many ML engineers fail to fully understanding. One perhaps is able to use Scikit Learn API to use in ML pipeline but knowing the underlying math. This post will glide though the intuition of PCA as if explaining it to a young kid, and then to manually calculate the PCA for us to learn insights behind the API.
Intuition
What is PCA
PCA is a method of summarising a wide table using fewer characteristics. For example, say there are 10 kinds of fizzy drinks. Each of them has its own flavour, colour, clarity, sweetness, smoothness, acidity, price etc.. You need to place them on one shelf of 4 levels. You need to place the similar drinks close together. With so many characteristics all share similar parts, which ones do you choose? PCA lets you discover a new way to group the similar drinks together.
It lets you to summarise (represent) a higher dimensional data onto a lower dimensional space. (3D –> 2D, 2D –> 1D)
Note, PCA does not simply discard (a) dimension(s) to achieve the result (lol). Instead, it constructs new characteristics about the drink using the existing ones. For the drinks, a new property may be it’s sweet, colourless with the same price. PCA will find the best possible characteristics among conceivable linear options.
Calculating PCA
Let’s use the Iris dataset as example
Preparing the dataset as
Let’s plot it, see what they look like on canvas.
We can see that there are two classes are not very well separated (Black and Grey)
Method 1: Calculate PCA through eigenvalues
Data preprocessing (Common to both)

Mean normalisation Mean normalisation is important to PCA as it helps to maximise the variance of the data. More variance, more room for the algorithm to separate different groups and find optimal PCA.

Optional feature scaling If your data tend to have different scales say in a house dataset, you have columns to describe the number of rooms also the size of the room in square metres.
We will then perform following steps:
 Compute Covariance
\displaystyle var\left(X\right)=\frac{\Sigma_{i=i}^n\left(X_i\overline{X}\right)\left(X_i\overline{X}\right)}{\left(n1\right)}  Compute “Eigenvectors” of Covariance Matrix
\displaystyle \Sigma \vec{v}=\lambda \vec{v}  Select top N biggest eigenvectors
 Project dataset onto subspace
Method 2: Calculate PCA through SVD
In Professor Andrew Ng’s lecture video, they use Singular Value Decomposition instead of eigenvector decomposition of covariance matrix because SVD is numerically more robust. There are cases where computing the covariance matrix leads to numerical problems.
Use Scikit API
Of course, practically, one won’t write all that for PCA, it’s been done in Scikit API.