Path-breaking paper: Statistical significance for genomewide studies
Storey, John D., and Robert Tibshirani. “Statistical significance for genomewide studies.” Proceedings of the National Academy of Sciences 100.16 (2003): 9440-9445.
The GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Chen X, Robinson DG, Storey JD, The functional false discovery rate with applications to genomics, Biostatistics, Volume 22, Issue 1, January 2021, Pages 68–81,
Motivation
Is it statistics or is it machine learning? Does it matter? Storey and Tibshirani’s paper can rightfully claim to have changed the field of statistical learning. It has been cited more than 10,000 times and its results are now included in all textbooks in the field. Every student wishing to analyze genome-scale data must understand what q-values and false discovery rate mean.
With a paper of this standing, it can seem exaggerated to ask for a test of time. We look at a paper from the GTEx project to show how the multiple testing problem has exploded far beyond the scale considered in Storey and Tibshirani’s paper, but can still be adequately addressed using their method. We also look at a recent methodological paper, to show there is always scope for new ideas, even for something as well established as Storey and Tibshirani’s method.
Questions for discussion
We cover the following parts:
- Abstract
- Example 1, differentially expressed genes
- Table 1
- What is a p-value?
- What is the FDR?
- Derivation that p-values are uniformly distributed under the null hypothesis
- Figure 1
- FDR estimation
- Figure 2
- What is a q-value?