4 PhD positions in computer science

Four PhD positions in computer science are available at the Department of Informatics. These positions can be held in any of the department’s research areas (algorithms, bioinformatics, machine learning, optimization, programming theory, security, and visualization). Applicants with an interest in computational biology and machine learning are welcome to contact me prior to submitting their application.

For more details and application instructions, see here.

VSG Pacbio preprint

It’s been a long time in the making, but Sid’s work on using long read sequencing to determine expressed antigen diversity in Trypanosoma brucei infections has finally been posted on bioRxiv!

In this collaboration with Liam Morrison, we applied long read sequencing (PacBio) to VSG amplicons generated from blood extracted from mice infected with T. brucei. We found that long read sequencing is reliable for resolving allelic differences between VSGs, that there is significant expressed diversity (449 VSGs detected across 20 mice) and that there is a striking semi-reproducible pattern of expressed diversity across the timeframe of study.

Well done Sid and everyone else who contributed to this study!

Enhancer-based causal inference preprint

There is a lot of research showing that genetic information in combination with gene expression data can be used to predict causal interactions between genes, on the basis that genetic variation among individuals causes gene expression variation but not vice versa (this PLOS CompBio article is a contribution to the field from our group and has links to earlier work). Anagha Joshi’s group asked if this principle could be extended to other contexts, and in a joint preprint “Causal gene regulatory network inference using enhancer activity as a causal anchor” an affirmative answer is given: variation of epigenetic activity at enhancer elements across multiple cell types or experimental treatments together with gene expression data also predicts causal interactions. The accompanying statistical methods have been implemented in our Findr software.

Multi-tissue clustering preprint

A preprint on “Model-based clustering of multi-tissue gene expression data” is available from arXiv. In this paper we present a Bayesian model-based clustering algorithm for large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. Our model can incorporate prior information on physiological tissue similarity, and results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. The algorithm has been implemented in the Lemon-Tree software as a new “task”, revamp.