BINF301 Machine Learning for Systems Biology
  1. Statistical significance
  2. Path-breaking paper
  • Home
  • Introduction to Julia
  • Cluster analysis
    • Path-breaking paper
    • Theory
    • Implementation
  • Statistical significance
    • Path-breaking paper
    • Theory
    • Implementation
  • Regularized regression
    • Path-breaking paper
    • Theory
    • Implementation
  • Dimensionality reduction
    • Path-breaking paper
    • Theory (PCA)
    • Theory (t-SNE, UMAP)
    • Implementation
  • Causal inference
    • Path-breaking paper
    • Theory
    • Implementation
  • Bayesian networks
    • Path-breaking paper
    • Theory
  • Gaussian processes
    • Path-breaking paper
    • Theory
    • Implementation
  • Neural networks
    • Path-breaking paper
    • Theory
  • Appendix

On this page

  • Motivation
  • Questions for discussion
  • Test of time
  1. Statistical significance
  2. Path-breaking paper

Path-breaking paper: Statistical significance for genomewide studies

Path-breaking paper

Storey, John D., and Robert Tibshirani. “Statistical significance for genomewide studies.” Proceedings of the National Academy of Sciences 100.16 (2003): 9440-9445.

Test of time paper

The GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

Test of time paper

Chen X, Robinson DG, Storey JD, The functional false discovery rate with applications to genomics, Biostatistics, Volume 22, Issue 1, January 2021, Pages 68–81,

Motivation

Is it statistics or is it machine learning? Does it matter? Storey and Tibshirani’s paper can rightfully claim to have changed the field of statistical learning. It has been cited more than 10,000 times and its results are now included in all textbooks in the field. Every student wishing to analyze genome-scale data must understand what q-values and false discovery rate mean.

With a paper of this standing, it can seem exaggerated to ask for a test of time. We look at a paper from the GTEx project to show how the multiple testing problem has exploded far beyond the scale considered in Storey and Tibshirani’s paper, but can still be adequately addressed using their method. We also look at a recent methodological paper, to show there is always scope for new ideas, even for something as well established as Storey and Tibshirani’s method.

Questions for discussion

We cover the following parts:

  • Abstract
  • Example 1, differentially expressed genes
  • Table 1
  • What is a p-value?
  • What is the FDR?
  • Derivation that p-values are uniformly distributed under the null hypothesis
  • Figure 1
  • FDR estimation
  • Figure 2
  • What is a q-value?

Test of time

Implementation
Theory