Path-breaking paper: Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks
Zhu J et al. Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nature Genetics 40:854 (2008).
Zhang B, et al. . Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 153:707–720 (2013)
Talukdar HA, et al. Cross-tissue regulatory gene networks in coronary artery disease. Cell Systems 2:196 (2016)
Beckmann ND., et al. Multiscale causal networks identify VGF as a key regulator of Alzheimer’s disease. Nature communications 11:3942 (2020).
Questions for discussion
What was the aim of the study?
Which data types did the authors combine?
Several networks are constructed and analyzed, what is their meaning and what do they represent?
- Coexpression network
- PPI network
- Probabilistic causal network / Bayesian network (BN)
- BN raw
- BN qtl
- BN full
How did the authors test if their Bayesian networks were predictive?
Other approaches to gene regulatory network inference
Bayesian networks are important models for gene regulatory networks because they allow to integrate heterogeneous data, include prior information in a principled matter, and make concrete predictions about the response to experimental perturbations. When the number of abailable data types is limited (for instance when only expression data is available), other approaches may have advantages, for instance in terms of computational efficiency. Common to these approaches is that they only use prior information about which genes can act as regulators (typically transcription factors). See also a recent perspective on network inference in Nature: Smart software untangles gene regulation in cells
Two important methodological directions are:
Information-theoretic methods
Similar to coexpression networks, methods in this class reconstruct networks using pairwise associations, but using mutual information as a measure. Deciding appropriate threshold levels of significance across genes with potentially very different biological function is the main challenge in this line of work. A gene-specific background correction known as “context likelihood of relatedness” is probably the most successful (and it works well with linear correlation measures too):
J Faith et al. Large-Scale Mapping and Validation of E. coli Transcriptional Regulation from a Compendium of Expression Profiles. PLOS Biol 5:e8 (2007).
Regression-based methods
In regression-based methods, regularized regression is used to construct models that predict a gene’s expression level given the expression levels of a set of candidate regulators. Regulators with highest weights or feature importance in these models are then declared to be the gene’s upstream regulators. Essentially any regression model could be used, but over the years an approach based on rendom forest regression has been the most popular:
V Huynh-Thu et al. Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLOS One 5:e12776 (2010).