BINF301 Machine Learning for Systems Biology
  1. Bayesian networks
  2. Theory
  • Home
  • Introduction to Julia
  • Cluster analysis
    • Path-breaking paper
    • Theory
    • Implementation
  • Statistical significance
    • Path-breaking paper
    • Theory
    • Implementation
  • Regularized regression
    • Path-breaking paper
    • Theory
    • Implementation
  • Dimensionality reduction
    • Path-breaking paper
    • Theory (PCA)
    • Theory (t-SNE, UMAP)
    • Implementation
  • Causal inference
    • Path-breaking paper
    • Theory
    • Implementation
  • Bayesian networks
    • Path-breaking paper
    • Theory
  • Gaussian processes
    • Path-breaking paper
    • Theory
    • Implementation
  • Neural networks
    • Path-breaking paper
    • Theory
  • Appendix

On this page

  • A crash course in Bayesian networks
  1. Bayesian networks
  2. Theory

Bayesian networks

Book sections

Pattern Recognition and Machine Learning: Chapter 8

A crash course in Bayesian networks

In Bayesian networks, the joint distribution over a set \(\{X_1,\dots, X_p\}\) of random variables is represented by:

  • a directed acyclic graph (DAG) \(\cal G\) with the variables as vertices,
  • a set of conditional probability distributions, \[\begin{aligned} P\bigl(X_i \mid \{X_j \mid j\in \mathrm{Pa}_i\}\bigr), \end{aligned}\] where \(\mathrm{Pa}_i\) is the set of parents of vertex \(i\) in \(\mathcal{G}\),

such that \[\begin{aligned} P(X_1,\dots,X_p) = \prod_{i=1}^p P\bigl(X_i \mid \{X_j \mid j\in \mathrm{Pa}_i\}\bigr) \end{aligned}\]

In GRN reconstruction:

  • \(X_i\) represents the expression level of gene \(i\),
  • \(P(X_1,\dots,X_p)\) represents the joint distribution from which (independent) experimental samples are drawn,
  • \(\mathcal{G}\) represents the unknown GRN.

Assume we have a set of \(N\) observations \(x_1,\dots,x_N\in \mathbb{R}^p\) of \(p\) variables, collected in the \(N\times p\) matrix \(\mathbf{X}\).

Assume we know \(\mathcal{G}\) and the conditional distributions. Then the likelihood of observing the data is:

\[\begin{aligned} P(\mathbf{X}\mid \mathcal{G}) &= \prod_{k=1}^N P(X_{k1},\dots,X_{kp}\mid \mathcal{G})\\\\ &= \prod_{i=1}^p \prod_{k=1}^N P\bigl(X_{ki} \mid \{X_{kj} \mid j\in \mathrm{Pa}_i\}\bigr) \end{aligned}\]

We can now use an iterative algorithm to optimize \(\mathcal{G}\) and the conditional distributions:

  • Start with a random graph \(\mathcal{G}\).
  • Given \(\mathcal{G}\), the likelihood decomposes in a product of independent likelihoods, one for each gene, and the conditional distributions can be optimized by standard regression analysis.
  • In the next iterations, randomly add, delete, or reverse edges in \(\mathcal{G}\), as long as the likelihood improves.

A more formal approach to optimizing \(\mathcal{G}\) uses Bayes’ theorem: \[\begin{aligned} P(\mathcal{G}\mid \mathbf{X}) = \frac{P(\mathbf{X}\mid \mathcal{G}) P(\mathcal{G})}{P(\mathbf{X})} \end{aligned}\]

  • \(P(\mathcal{G})\) represents the prior distribution: even without seeing any data, not all graphs need to be equally likely a priori.
  • \(P(\mathbf{X})\) represents the marginal distribution: \(P(\mathbf{X}) = \sum_{\mathcal{G}'} P(\mathbf{X}\mid \mathcal{G}') P(\mathcal{G}')\). It is independent of \(\mathcal{G}\) and can be ignored.

We can use \(P(\mathcal{G})\) to encode evidence for causal interactions from integrating genomics and transcriptomics data. This is the main idea from Zhu et al. (2004).

Path-breaking paper
Path-breaking paper