Path-breaking paper: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity

Path-breaking paper

Barretina J et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483:603 (2012).

Test of time paper

Ghandi, M., Huang, F.W., Jané-Valbuena, J. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).

Test of time paper

Dixit A, et al. Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell 167(7):1853-66 (2016).

Motivation

Fitting linear prediction models using elastic net regularization is highly popular when working with genome-scale data because it serves (at least) three important modelling goals:

Elastic net regularization handles data where the number of predictive features is much larger than the number of samples.
Elastic net regularization performs variable selection (such that only a subset of features contribute to the model) while also encouraging grouping (such that strongly correlated features are in or out of the model together).
Elastic net regularized models are linear, and hence the contribution of each feature to the model is easily measurable by its regression weight.

It is hard to determine exactly when elastic net regression made it into mainstream computational biology. Here we selected the first paper of the CCLE project, where elastic net regression played an important role. The paper allows to introduce an important and still running large-scale project (as shown by the first test-of-time paper), and allows to contrast the difference between perturbational and observational data. This gives a natural connection to a more modern approach to perturbational experiments that combines CRISPR and single-cell sequencing technologies (second test-of-time paper), while still heavily depending on elastic net regression for the data analysis.

Questions for discussion

Abstract - What is the study about and why was it performed?

What is a cancer cell line? See

Which genomic profiles were generated?

Are cell lines good models for real tumours? Figure 1.

What are pharmacological profiles, how is drug sensitivity measured?

What is predictive modelling, how were robust models derived, what is shown in Figure 2?

How were hyperparameters optimized? Does the procedure follow current best practice in machine learning with regards to splitting data into training, validation, and test sets?