Path-breaking paper: Gene expression profiling predicts clinical outcome of breast cancer
van ’t Veer L et al. Gene expression profiling predicts clinical outcome of breast cancer . Nature 415:530 (2002).
See also: The molecular outlook.
The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Motivation
We start our sequel to Back to the future: education for systems-level biologists where the first one ended, with cluster analysis of transcriptome data. The credit for being the first to using cluster analysis on gene expression data (from yeast) probably goes to Eisen et al.’s Cluster analysis and display of genome-wide expression patterns, which was the paper selected by Wingreen & Botstein for their course. To avoid repetition, we go for van ’t Veer et al.’s Gene expression profiling predicts clinical outcome of breast cancer.
The study by van ’t Veer et al was one of the first to use microarrays, a brand-new technology at the time, to profile gene expression on a genome-wide scale from surgically removed tumour samples - breast tumours in this case. Another paper from around the same time is: Perou et al. Molecular portraits of human breast tumours. A perspective on these papers from the time of publication is in The molecular outlook.
Another reason for choosing van ’t Veer et al is the interesting story that followed: the authors went on a 20-year journey to translate the gene expression signature they identified to a commercial diagnostic tool, see Section 0.3 below.
The success of these initial studies using cluster analysis to detect meaningful and clinically relevant patterns in genome-scale data arguably paved the way for large-scale studies such as The Cancer Genome Atlas (TCGA) Program. We take the 2012 TCGA paper on breast tumours as our test-of-time paper, see Section 0.4.
Questions for discussion
Why is classification of diseases important? How were breast tumours classified before molecular profiles became available?
How many tumour samples were analyzed by Van ’t Veer et al? How many genes were used for the cluster analysis and how were these genes selected
The most striking finding is in Figure 1. What does this figure show?
Clinical features are used to annotate and understand the observed separation of gene expression profiles in distinct clusters. What does each of the features measure and how do the authors characterize the overall classification of tumours? Starting points to read more about the clinical features:
- BRCA1 germline mutation: harmful variants in the BRCA1 or BRCA2 genes that markedly increase risk for developing breast cancer.
- Estrogen receptor (ER) status: breast tumour cells that express ER on their surface need estrogen to grow, and are therefore more susceptible to hormone therapy.
- Tumour grade: a measure of degree of abnormality of cancer cells.
- Lymphocyte infiltration: an indication whether the cancer has spread to the lymph nodes.
- Angionvasion: an indication whether the cancer has spread to the blood vessels.
- Metastatic status: an indication whether the cancer has spread to othre organs.
The authors identified a minimal prognostic signature from their data using a supervised approach. How does this approach work and how many marker genes were in the final, optimal set? For those with machine learning background, can you think of other (better?) supervised approaches to achieve the same goal?
Translation to the clinic
Having identified a strong gene expression signature to predict clinical outcome of breast cancer, the race to bring it to the clinic is on. That this is far from trivial can be seen by tracing the follow-up studies and clinical trials:
- Van De Vijver MJ et al. A gene-expression signature as a predictor of survival in breast cancer. NEJM 347:1999 (2002).
- Buyse M et al. Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J Ntnl Canc Inst 98:1183 (2006).
- Mook S et al. Individualization of therapy using MammaPrint: From development to the MINDACT Trial. Canc Genomics & Proteomics 4:147 (2007).
- Cardoso F et al. 70-gene signature as an aid to treatment decisions in early-stage breast cancer. NEJM 375:717 (2016).
- Brandão M, Pondé N, Piccart-Gebhart M. Mammaprint: a comprehensive review. Fut Onc 15:207 (2019).
They got there eventually, and the gene expression signature is now commercially available under the name of Mammaprint.
Test of time: The Cancer Genome Atlas
Although the results by Van ’t Veer et al. were obtained from a small (by current standards!) sample size, they have been reproduced consistenly in larger studies and arguably spawned a search for similar signatures in other cancer types through large-scale projects, such as The Cancer Genome Atlas (TCGA) Program.
The amount of data and number of publications produced by TCGA is too enormous to survey here in detail.
To see how far the field progressed in the decade after van ’t Veer et al, read the 2012 TCGA paper on breast tumours.
Questions for discussion
What are the main differences between van ’t Veer et al and the TCGA paper?
How is cluster analysis used in the TCGA paper?