G3 paper

Our paper “Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders” has been published in G3. In this paper we analyse the mathematical structure of a class of statistical models for learning hidden factors influencing gene expression data and show that a new algorithm based on the analytical results is orders of magnitude faster than the standard algorithms for solving this class of models.

Two preprints by Ramin

Congrats to Ramin for posting no less than two preprints on arXiv:

Both papers result from a collaboration between Ramin and Dariush Salami at the Ambient Intelligence group at Aalto University, and introduce new graph neural network-based methods for analyzing point cloud data.

Great to see when students independently initiate projects and push them through to completion (exposing the limits of their supervisor’s knowledge in the process 🙂

Frontiers Genetics paper

Lingfei’s paper, “High-dimensional Bayesian network inference from systems genetics data using genetic node ordering” has been published in Frontiers in Genetics, in a Special Topic on Machine Learning and Network-Driven Integrative Genomics.

In this paper, we present a highly efficient approach for reconstructing Bayesian gene regulatory networks when prior information for the inclusion of edges exists or can be inferred from the available data. The method is implemented in the Findr software.

Bioinformatics paper

Pau’s paper “Model-based clustering of multi-tissue gene expression data” has been published in Bioinformatics. In this paper a method, called “revamp”, is introduced to find clusters (groups of genes with shared activity patterns) in multi-tissue data, where gene expression profiles are available from multiple tissues or organs sampled from the same group of individuals. Revamp improves existing methods by its ability to incorporate prior information on physiological tissue similarity, and by identifying a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Revamp is implemented in the Lemon-Tree software.

RSOS paper

Lingfei’s paper “Accurate wisdom of the crowd from unsupervised dimension reduction” has been published in Royal Society Open Science. In this paper it is shown that wisdom of the crowd, the collective intelligence derived from responses of multiple individuals to the same questions, is analogous to one-dimensional unsupervised dimension reduction in machine learning. This means that many of-the-shelf dimension reduction methods, such as good old PCA, can be repurposed as crowd-wisdom methods, usually with (much) better performance than existing default crowd-wisdom methods. Perhaps one of the more surprising results concerned the classification of skin images as being cancerous or not. As part of the hype surrounding deep learning, it was recently found that a deep neural network trained on 130,000 images was better at classifying a test set of 111 skin images than 21 individual dermatologists. However, we found that by doing a simple PCA of the predictions of these 21 dermatologists, they collectively outperformed the deep neural network. As The Economist put it in their recent ad, “not all intelligence is artificial”. In fact some of it is collective.