Path-breaking paper: Statistical significance for genomewide studies

Path-breaking paper

Storey, John D., and Robert Tibshirani. “Statistical significance for genomewide studies.” Proceedings of the National Academy of Sciences 100.16 (2003): 9440-9445.

Test of time paper

The GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

Test of time paper

Chen X, Robinson DG, Storey JD, The functional false discovery rate with applications to genomics, Biostatistics, Volume 22, Issue 1, January 2021, Pages 68–81,

Motivation

Is it statistics or is it machine learning? Does it matter? Storey and Tibshirani’s paper can rightfully claim to have changed the field of statistical learning. It has been cited more than 10,000 times and its results are now included in all textbooks in the field. Every student wishing to analyze genome-scale data must understand what q-values and false discovery rate mean.

With a paper of this standing, it can seem exaggerated to ask for a test of time. We look at a paper from the GTEx project to show how the multiple testing problem has exploded far beyond the scale considered in Storey and Tibshirani’s paper, but can still be adequately addressed using their method. We also look at a recent methodological paper, to show there is always scope for new ideas, even for something as well established as Storey and Tibshirani’s method.

Questions for discussion

We cover the following parts:

Abstract
Example 1, differentially expressed genes
Table 1
What is a p-value?
What is the FDR?
Derivation that p-values are uniformly distributed under the null hypothesis
Figure 1
FDR estimation
Figure 2
What is a q-value?