My overall research aim is to understand how genetic, epigenetic and gene expression variation cause variation in health and disease traits in human, using statistical and machine learning approaches. Current projects in my group are divided along the following themes:

Theme 1: Gene regulatory networks

I am interested in the development of models, algorithms and software to infer causal gene regulatory networks from multiple omics data. The key challenge here is causal inference: to transform patterns of co-expression among transcripts, proteins, metabolites and phenotypes into truly predictive models of biological systems.

In genetics, the random segregation of alleles effectively results in massively parallel randomized experiments, where the direction of causality between co-expressed genes can be inferred from their joint genetic linkage to cis-regulatory DNA sequences. While this basic principle is well-established, important challenges remain. My group is building an expanded toolkit for causal inference in systems genetics, based on:

  1. Development of efficient software for handling deep RNA-sequencing data from hundreds to thousands of individuals.
  2. Development of more sensitive statistical models to account for multiple levels of known and unknown confounding factors and noise in the data.
  3. Development of methods for the inference of global causal networks involving thousands of genes.

Theme 2: Causal variants

Theme 3: Machine learning