My overall research aim is to understand how genetic, epigenetic and gene expression variation cause variation in health and disease traits in human, using statistical and machine learning approaches. Current projects in my group are divided along the following themes:
Theme 1: Gene regulatory networks
I am interested in the development of models, algorithms and software to infer causal gene regulatory networks from multiple omics data. The key challenge here is causal inference: to transform patterns of co-expression among transcripts, proteins, metabolites and phenotypes into truly predictive models of biological systems.
In genetics, the random segregation of alleles effectively results in massively parallel randomized experiments, where the direction of causality between co-expressed genes can be inferred from their joint genetic linkage to cis-regulatory DNA sequences. While the basic principle of causal inference in this field of “systems genetics” is well-established, important challenges remain. My group is working on three inter-related problems:
- Development of efficient software for handling deep RNA-sequencing data from hundreds to thousands of individuals.
- Development of more sensitive statistical models to account for multiple levels of known and unknown confounding factors and noise in the data.
- Development of methods for the inference of global causal networks involving thousands of genes.