Path-breaking papers: Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region & SpatialDE: identification of spatially variable genes
Data & Technology:
- Ståhl PL et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics
- Moffit JR, et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region, Science 262:eaau5324 (2018).
Methodology:
- Svensson V, et. al. SpatialDE: identification of spatially variable genes, Nature Methods 15:343 (2018).
- Arnol, et al. Modeling Cell-Cell Interactions from Spatial Molecular Data with Spatial Variance Component Analysis. Cell Reports 29:202 (2019).
Kang HM, et al. Efficient Control of Population Structure in Model Organism Association Mapping, Genetics 178(3):1709–1723 (2008)
Motivation
Most statistical and machine learning methods for analyzing genome-scale data assume that the data consist of independent samples from an unknown distribution or data generating process. However, in important applications, the assumption of independence does not hold: in genetic studies, the correlation structure of sampled data (e.g. transcriptome profiles) may reflect underlying population structure, while in studies of spatially inhomogeneous structures (e.g tissues or organs) or temporal processes, the correlation structure of sampled data will reflect their spatial or temporal proximity.
Variance component models and Gaussian processes have emerged as powerful methods for modelling data with an underlying correlation structure. Historically, the first application of this type of models to genome-wide data appears to be Kang et al.’s paper to correct for population structure in genetic association studies, and related approaches are now the standard in the field (see this paper for an important methodological innovation and this paper for an application on UK Biobank data). Because we already work with population genomics data in the causal inference and Bayesian networks modules, we present this line of research as a “reverse test of time” paper.
For many computational biologists, the challenge of working with data with an underlying correlation structure nowadays is most prominent in the field of spatial transcriptomics. Because there is no single paper that was path-breaking in terms of both biology and computational methodology, we study a combination of papers. Ståhl et al combine RNA sequencing with spatial barcodes and are credited on Wikipedia with being the first to develop a spatial transcriptomics method, while Moffit et al was one of the first papers to apply the MERFISH technology to measure the spatial distribution of gene expression in single cells. We will use data from both these papers in our own analysis.
Svensson et al and Arnol et al show how the application of Gaussian process models can identify genes with spatial structure in their expression profile from this type of data.
Video lecture on spatial transcriptomics
Spatial transcriptome profiling lecture from the Linnarsson lab.