BINF301 Machine Learning for Systems Biology
  1. Dimensionality reduction
  2. Theory (PCA)
  • Home
  • Cluster analysis
    • Path-breaking paper
    • Theory
    • Implementation
  • Statistical significance
    • Path-breaking paper
    • Theory
    • Implementation
  • Regularized regression
    • Path-breaking paper
    • Theory
    • Implementation
  • Dimensionality reduction
    • Path-breaking paper
    • Theory (PCA)
    • Theory (t-SNE, UMAP)
    • Implementation
  • Causal inference
    • Path-breaking paper
    • Theory
    • Implementation
  • Bayesian networks
    • Path-breaking paper
    • Theory
  • Gaussian processes
    • Path-breaking paper
    • Theory
    • Implementation
  • Neural networks
    • Path-breaking paper
    • Theory
  • Appendix
    • Introduction to Julia

On this page

  • Dimensionality reduction
  • Principal component analysis (PCA)
  1. Dimensionality reduction
  2. Theory (PCA)

PCA

TipBook sections
  • Pattern Recognition and Machine Learning: Chapter 12
  • Elements of Statistical Learning: Section 14.5.1
  • An Introduction to Statistical Learning: Section 6.3.1

Dimensionality reduction

Assume we have a set of \(N\) observations (e.g. cells) of \(p\) variables (e.g. genes).

Dimensionality reduction consists of representing the observations in a lower-dimensional space with dimension \(q\ll p\) while preserving some characteristics (e.g. relative distance) of the original \(p\)-dimensional representation.

For visualizing relationships among observations, we typically use \(q=2\).

Principal component analysis (PCA)

Assume we have a set of \(N\) observations \(x_1,\dots,x_N\in \mathbb{R}^p\) of \(p\) variables, collected in the \(N\times p\) matrix \(\mathbf{X}\).

The principal components of \(\mathbf{X}\) provide a sequence of best linear approximations to \(\mathbf{X}\), representing \(\mathbf{X}\) by a line (\(q=1\)), plane (\(q=2\)), etc. in \(p\)-dimensional space.

A rank-\(q\) linear model in \(p\)-dimensional space takes the form

\[ f(y) = \mu + y_1 v_1 + \dots + y_q v_q = \mu + \mathbf{V}_q z, \]

where \(\mu\in\mathbb{R}^p\) is a location vector, \(z\in\mathbb{R}^q\) is a (low-dimensional) vector, and \(\mathbf{V}_q=(v_1,\dots,v_q)\) is a \(p\times q\) matrix of \(q\) orthogonal unit vectors,

\[ \begin{aligned} v_k v_{k'}^T = \sum_{j=1}^p v_{kj}v_{k'j}= \delta_{kk'} \end{aligned} \]

Fitting this model to the data using least squares amounts to minimizing the reconstruction error:

\[ \min_{\mu,\{y_{i}\},\mathbf{V}\_q} \sum_{i=1}^N \Bigl\| x_i - \mu - \mathbf{V}_q z_i \Bigr\|^2 \]

Partial optimization for \(\mu\) and the \(y_i\) gives

\[ \begin{aligned} \hat{\mu} &= \bar x = \frac1{N}\sum_{i=1}^N x_i\\ \hat{z}_i &= \mathbf{V}_q^T(x_i-\bar{x}) \end{aligned} \]

Plugging this into () leaves us to find the orthogonal matrix \(\mathbf{V}_q\):

\[ \begin{aligned} \min_{\mathbf{V}\_q} \sum_{i=1}^{N} \Bigl\| (x_i - \bar{x}) - \mathbf{V}_q\mathbf{V}_q^T(x_i-\bar{x}) \Bigr\|^2 \end{aligned} \]

It can be shown that the optimal solution is found when \(\mathbf{V}_q=(v_1,\dots,v_q)\) contains the \(q\) eigenvectors with largest eigenvalues of the \(p\times p\) matrix \(\mathbf{X}^T\mathbf{X}\) (if the data is centred such that \(\bar x=0\)).

The solution can be expressed using the singular value decomposition of the obsered data matrix \(\mathbf{X}\):

\[ \mathbf{X} = \mathbf{U} \Lambda \mathbf{V}^T \]

where \(\mathbf{U}\) is an \(N\times p\) orthogonal matrix (\(\mathbf{U}^T\mathbf{U}=\mathbf{I}_p\)) whose columns are called the left singular vectors, \(\mathbf{V}\) is a \(p\times p\) orthogonal matrix (\(\mathbf{V}^T\mathbf{V}=\mathbf{I}_p\)) whose columns are called the right singular vectors, and \(\Lambda\) is a \(p\times p\) diagonal matrix with diagonal elements \(\lambda_1\geq \lambda_2 \geq \dots \geq \lambda_p\geq 0\) called the singular values.

The solution \(\mathbf{V}_q\) above consists of the first \(q\) columns of \(\mathbf{V}\).

The columns of \(\mathbf{U}\Lambda\) are called the principal components of \(\mathbf{X}\). The first \(q\) principal components are \(\mathbf{U}_q\Lambda_q\). Each principal component is a vector in \(\mathbb{R}^N\), that is, it takes a value for each observation. The solution \(\hat{z}_i\) above consists of the rows of \(\mathbf{U}_q\Lambda_q\), that is, for each observation \(i\), \(\hat{z}_i\) consists of the values of the first \(q\) principal components for observation \(i\).

Path-breaking paper
Theory (t-SNE, UMAP)