Welcome

This is the website for the second part of the course BINF301 Genome-scale Algorithms. This part of the course focuses on machine learning algorithms for systems biology.

The lectures are divided in modules, each focusing on a specific class of methods:

Clustering
Statistical significance
Regularized regression
(Classification)
Dimensionality reduction
Causal inference
Graphical models
Gaussian processes
Neural networks
(Network propagation)

Each module follows the same structure:

A classic or path-breaking biological or biomedical research paper is studied where the algorithm (or class of algorithms) of interest was first used. One or more “test of time” papers illustrate recent applications of the same algorithms.
The method used in the paper(s) is studied in detail, along with additional methods to solve the same type of problem.
The methods are put in practice using original or similar data from the papers studied in the first part.

The computational demonstrations for the course were originally developed in Julia, using reactive Pluto notebooks. Python versionsof the notebooks are under construction. It should be possible to run most notebooks on a standard laptop. Students at UiB can also make use of HubroHub, the university’s JupyterHub service.

The processed data for the course are available on OneDrive. Details about the raw data sources and preprocessing steps are in the BINF301-code repository.

An appendix contains an introduction to Julia and the minimum required background knowledge on gene regulation, probability theory, linear algebra, and optimization (under construction).

The theoretical sections contain the basic information to understand a method, pointing to relevant sections of the following textbooks (with free pdfs!) for details:

Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor. An Introduction to Statistical Learning (2023).
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning (second edition) (2009).
Christopher Bishop. Pattern Recognition and Machine Learning (2006).
Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine Learning (2020).

The use of path-breaking papers is motivated by Back to the future: education for systems-level biologists. Since the field of genome-scale data analysis is still relatively young, the choice of papers for study is still a bit open and likely to evolve as the course matures.

If you want to stay up-to-date on what is happening in the field now, consider joining the Machine Learning in Computational and Systems Biology community.