Welcome

This is the website for the second half of the course BINF301 Genome-scale Algorithms. This part of the course focuses on machine learning algorithms for systems biology.

The lectures are divided in modules, each focusing on a specific class of methods:

Clustering
Statistical significance
Regularized regression
(Classification)
Dimensionality reduction
Causal inference
Graphical models
Gaussian processes
Neural networks
(Network propagation)

Each module follows the same structure:

A classic or path-breaking biological or biomedical research paper is studied where the algorithm (or class of algorithms) of interest was first used. One or more “test of time” papers illustrate recent applications of the same algorithms.
The method used in the paper(s) is studied in detail, along with additional methods to solve the same type of problem.
The methods are put in practice using original or similar data from the papers studied in the first part.

The computational demonstrations for the course were developed in Julia, and the course starts with an introduction to Julia. For the assignments, you can use any programming language.

An appendix contains the minimum required background knowledge on gene regulation, probability theory, linear algebra, and optimization.

The theoretical sections contain the basic information to understand a method, pointing to relevant sections of the following textbooks (with free pdfs!) for details:

Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning (second edition) (2009).
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor. An Introduction to Statistical Learning (2023).
Christopher Bishop. Pattern Recognition and Machine Learning (2006).
Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine Learning (2020).

The use of path-breaking papers is motivated by Back to the future: education for systems-level biologists. Since the field of genome-scale data analysis is still relatively young, the choice of papers for study is still a bit open and likely to evolve as the course matures.

It is sometimes said that we overestimate the change that will happen in 5 years and underestimate the change that will happen in 10 years. To show the evolution in the field, each path-breaking paper is accompanied by one or more “test of time” papers, illustrating more recent applications of similar algorithms as the original one. Often we will see that the major change is in the size and types of genome-scale data that can be generated nowadays, while the algorithms have mostly stood the test of time.

If you want to stay up-to-date on what is happening in the field now, consider joining the Machine Learning in Computational and Systems Biology community.