Path-breaking paper: Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing
Usoskin D et al. Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing . Nature Neuroscience 18:145 (2015).
Tasic B et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563:72 (2018).
Siletti K et al. Transcriptomic diversity of cell types across the adult human brain. Science 382:eadd7046 (2023). See also the Brain Cell Census collection
Motivation
Reducing high-dimensional data to a low-dimensional representation for the purposes of visualization and pattern discovery is a classical approach in statistics, with principal component analysis (PCA) probably the best known method. PCA and other dimensionality reduction methods have been used in systems biology for as long as (relatively) large data have been around. Nowadays, dimensionality reduction is mostly associated with the analysis of single-cell sequencing data. Given the importance of single-cell sequencing technologies, and the relative lack of understanding of the non-linear dimensionality reduction methods that are regularly employed in this context, this will be our angle for learning about dimensionality reduction.
For our path-breaking paper, we select a relatively early study of cell types in the brain using single-cell sequencing of 622 mouse neurons. In this study, PCA is used to visualize data in two and three dimensions and to identify distinct types of sensory neurons.
To illustrate how far the field of single-cell sequencing has advanced in the last decade, we select a paper from 2018, which sequenced more than 23 thousand cells and whose data we will use in our practical analysis, and a paper from the BRAIN Initiative Cell Census Network, which sequenced more than three million human brain cells, including more than two million neurons. In these papers, t-SNE and UMAP are used for dimensionality reduction, as is the case in most current studies in the field.
Questions for discussion
What is single-cell transcriptomics? We will look in more detail at this review paper: Revealing the vectors of cellular identity with Single-cell genomics.
What is the basic idea for discovering and classifying cell types?
How many expressed genes were detected on average in each cell? How many genes were detected in total across all cells (see Methods)? What could explain these numbers?
How did the authors decide on the initial classification of genes in neuronal and non-neuronal clusters? How did they define further subclusters of the initial neuronal clusters?
What is shown in Figure 1?
What is shown in Figure 4? How is the “fraction of positive cells” defined for a given gene?