DAG construction

Introduction

The posterior probabilities computed by the findr function can be used as a graph structure prior in Bayesian network learning using the model introduced in the paper High-dimensional Bayesian network inference from systems genetics data using genetic node ordering. This involves converting the pairwise posterior probabilities to a directed acyclic graph (DAG).

BioFindr implements the original “greedy edges” algorithm where edges are added one-by-one in decreasing order of probability, and only if they do not create a cycle in the graph, using an incremental cycle detection algorithm. Two additional algorithms from the paper Maximal acyclic subgraph optimization for gene regulatory networks are also implemented: the heuristic sort algorithm where vertices are sorted by their ratio of out-degree to in-degree, and edges are added only if their source vertex precedes their target vertex in the sorted list, and the greedy insertion algorithm where vertices are iteratively inserted in the position in the current ordering that yields the maximum possible gain of edge weights, where the gain is counted as the difference between the sum of new edge weights included and the sum of old edge weights lost, and edges are counted only if their source vertex precedes their target vertex in the ordering.

Set up the environment

using DrWatson
quickactivate(@__DIR__)

using DataFrames
using Arrow
using GLMakie
using GraphMakie

using BioFindr
WARNING: using GraphMakie.Arrow in module Main conflicts with an existing identifier.

Load data

For illustration we use the GEUVADIS microRNA data:

dm = DataFrame(Arrow.Table(datadir("exp_pro","findr-data-geuvadis", "dm.arrow")));
dgm = DataFrame(Arrow.Table(datadir("exp_pro","findr-data-geuvadis", "dgm.arrow")));
dpm = DataFrame(SNP_ID = names(dgm), GENE_ID=names(dm)[1:ncol(dgm)]);

Run BioFindr

We perform causal inference to compute posterior probabilities from all microRNAs with an eQTL to the total set of microRNAs:

dP = findr(dm, dgm, dpm; FDR=0.25)
84×4 DataFrame
59 rows omitted
Row Source Target Probability qvalue
String String Float64 Float64
1 hsa-miR-550a-2-5p hsa-miR-550a-3-5p 1.0 0.0
2 hsa-miR-96-5p hsa-miR-182-5p 1.0 4.81948e-13
3 hsa-miR-550a-2-5p hsa-miR-550a-2-3p 1.0 2.53784e-11
4 hsa-miR-3130-1-5p hsa-miR-3130-2-5p 1.0 1.04578e-10
5 hsa-miR-550a-2-3p hsa-miR-550a-3-5p 1.0 1.5224e-10
6 hsa-miR-550a-2-3p hsa-miR-550a-2-5p 1.0 4.87261e-10
7 hsa-miR-183-5p hsa-miR-182-5p 1.0 5.45614e-9
8 hsa-miR-30a-3p hsa-miR-30a-5p 1.0 4.29665e-8
9 hsa-miR-574-5p hsa-miR-574-3p 0.999994 6.6706e-7
10 hsa-miR-335-5p hsa-miR-335-3p 0.999991 1.51939e-6
11 hsa-miR-30a-5p hsa-miR-30a-3p 0.999986 2.61404e-6
12 hsa-miR-200a-3p hsa-miR-200b-3p 0.999981 3.95989e-6
13 hsa-miR-2116-5p hsa-miR-2116-3p 0.999885 1.24838e-5
73 hsa-miR-641-5p hsa-miR-26a-2-3p 0.578209 0.218109
74 hsa-miR-1304-3p hsa-miR-619-5p 0.574428 0.220912
75 hsa-miR-3176-3p hsa-miR-143-3p 0.572199 0.223671
76 hsa-miR-96-5p hsa-miR-155-3p 0.57146 0.226367
77 hsa-miR-3176-3p hsa-miR-30b-5p 0.55876 0.229157
78 hsa-miR-3176-3p hsa-miR-4524a-5p 0.557952 0.231886
79 hsa-miR-1304-3p hsa-miR-4421-3p 0.556114 0.23457
80 hsa-miR-641-5p hsa-miR-629-5p 0.550412 0.237258
81 hsa-miR-641-5p hsa-let-7d-5p 0.541004 0.239995
82 hsa-miR-4482-1-3p hsa-miR-30a-3p 0.537326 0.242711
83 hsa-miR-3176-3p hsa-miR-138-1-3p 0.537207 0.245362
84 hsa-miR-193b-3p hsa-miR-181c-3p 0.528992 0.248049

To construct a DAG from the DataFrame dP with posterior probabilities using the default “greedy edges” algorithm, run:

G, name2idx = dagfindr!(dP);

The ! in the dagfindr! function name indicates that the function modifies its input argument. We can see that dP indeed contains some new columns:

dP
84×7 DataFrame
59 rows omitted
Row Source Target Probability qvalue Source_idx Target_idx inDAG_greedy_edges
String String Float64 Float64 Int64 Int64 Bool
1 hsa-miR-550a-2-5p hsa-miR-550a-3-5p 1.0 0.0 1 34 true
2 hsa-miR-96-5p hsa-miR-182-5p 1.0 4.81948e-13 2 16 true
3 hsa-miR-550a-2-5p hsa-miR-550a-2-3p 1.0 2.53784e-11 1 4 true
4 hsa-miR-3130-1-5p hsa-miR-3130-2-5p 1.0 1.04578e-10 3 15 true
5 hsa-miR-550a-2-3p hsa-miR-550a-3-5p 1.0 1.5224e-10 4 34 true
6 hsa-miR-550a-2-3p hsa-miR-550a-2-5p 1.0 4.87261e-10 4 1 false
7 hsa-miR-183-5p hsa-miR-182-5p 1.0 5.45614e-9 5 16 true
8 hsa-miR-30a-3p hsa-miR-30a-5p 1.0 4.29665e-8 6 9 true
9 hsa-miR-574-5p hsa-miR-574-3p 0.999994 6.6706e-7 7 14 true
10 hsa-miR-335-5p hsa-miR-335-3p 0.999991 1.51939e-6 8 13 true
11 hsa-miR-30a-5p hsa-miR-30a-3p 0.999986 2.61404e-6 9 6 false
12 hsa-miR-200a-3p hsa-miR-200b-3p 0.999981 3.95989e-6 10 35 true
13 hsa-miR-2116-5p hsa-miR-2116-3p 0.999885 1.24838e-5 11 12 true
73 hsa-miR-641-5p hsa-miR-26a-2-3p 0.578209 0.218109 31 80 true
74 hsa-miR-1304-3p hsa-miR-619-5p 0.574428 0.220912 26 81 true
75 hsa-miR-3176-3p hsa-miR-143-3p 0.572199 0.223671 20 82 true
76 hsa-miR-96-5p hsa-miR-155-3p 0.57146 0.226367 2 83 true
77 hsa-miR-3176-3p hsa-miR-30b-5p 0.55876 0.229157 20 84 true
78 hsa-miR-3176-3p hsa-miR-4524a-5p 0.557952 0.231886 20 85 true
79 hsa-miR-1304-3p hsa-miR-4421-3p 0.556114 0.23457 26 86 true
80 hsa-miR-641-5p hsa-miR-629-5p 0.550412 0.237258 31 87 true
81 hsa-miR-641-5p hsa-let-7d-5p 0.541004 0.239995 31 88 true
82 hsa-miR-4482-1-3p hsa-miR-30a-3p 0.537326 0.242711 29 6 true
83 hsa-miR-3176-3p hsa-miR-138-1-3p 0.537207 0.245362 20 89 true
84 hsa-miR-193b-3p hsa-miR-181c-3p 0.528992 0.248049 18 90 true

The Source_idx and Target_idx are numerical IDs for the Source and Target nodes, respectively, and the inDAG_greedy_edges indicates whether the edge represented by a row of dP in included in the output DAG G. The mapping from node names to IDs is also returned as a dictionary object name2idx.

name2idx
Dict{Symbol, Int64} with 90 entries:
  Symbol("hsa-let-7b-5p")     => 25
  Symbol("hsa-miR-143-3p")    => 82
  Symbol("hsa-miR-629-5p")    => 87
  Symbol("hsa-miR-625-5p")    => 65
  Symbol("hsa-miR-3667-5p")   => 21
  Symbol("hsa-miR-582-3p")    => 67
  Symbol("hsa-miR-103a-2-5p") => 71
  Symbol("hsa-miR-1307-3p")   => 30
  Symbol("hsa-miR-29a-3p")    => 54
  Symbol("hsa-miR-500a-5p")   => 75
  Symbol("hsa-miR-150-3p")    => 56
  Symbol("hsa-miR-1908-3p")   => 17
  Symbol("hsa-miR-200b-3p")   => 35
  Symbol("hsa-miR-194-2-5p")  => 52
  Symbol("hsa-miR-200a-3p")   => 10
  Symbol("hsa-miR-221-5p")    => 62
  Symbol("hsa-miR-2116-3p")   => 12
  Symbol("hsa-miR-4482-1-3p") => 29
  Symbol("hsa-miR-130b-3p")   => 68
  Symbol("hsa-miR-335-5p")    => 8
  Symbol("hsa-miR-128-2-3p")  => 41
  Symbol("hsa-miR-181c-3p")   => 90
  Symbol("hsa-miR-3150b-3p")  => 32
  Symbol("hsa-miR-1304-3p")   => 26
  Symbol("hsa-miR-642a-5p")   => 33
  ⋮                           => ⋮

The output G is a directed graph object from the Graphs package:

G
{90, 73} directed simple Int64 graph

This is a fairly simple datastructure, which only supports numerical node IDs, hence the need to create the name2idx map. One useful thing one can do with a Graph object is to draw it:

graphplot(G)

For more details, see the documentation.

To run the dagfindr! function with the other DAG construction algorithms mentioned in the Introduction,

G, name2idx = dagfindr!(dP; method="heuristic sort");
G, name2idx = dagfindr!(dP; method="greedy insertion");

The results of these dagfindr! calls are added to dP allowing easy comparison of the methods:

dP
84×9 DataFrame
59 rows omitted
Row Source Target Probability qvalue Source_idx Target_idx inDAG_greedy_edges inDAG_heuristic_sort inDAG_greedy_insertion
String String Float64 Float64 Int64 Int64 Bool Bool Bool
1 hsa-miR-550a-2-5p hsa-miR-550a-3-5p 1.0 0.0 1 34 true true true
2 hsa-miR-96-5p hsa-miR-182-5p 1.0 4.81948e-13 2 16 true false true
3 hsa-miR-550a-2-5p hsa-miR-550a-2-3p 1.0 2.53784e-11 1 4 true true true
4 hsa-miR-3130-1-5p hsa-miR-3130-2-5p 1.0 1.04578e-10 3 15 true true true
5 hsa-miR-550a-2-3p hsa-miR-550a-3-5p 1.0 1.5224e-10 4 34 true true true
6 hsa-miR-550a-2-3p hsa-miR-550a-2-5p 1.0 4.87261e-10 4 1 false false false
7 hsa-miR-183-5p hsa-miR-182-5p 1.0 5.45614e-9 5 16 true true true
8 hsa-miR-30a-3p hsa-miR-30a-5p 1.0 4.29665e-8 6 9 true false true
9 hsa-miR-574-5p hsa-miR-574-3p 0.999994 6.6706e-7 7 14 true true true
10 hsa-miR-335-5p hsa-miR-335-3p 0.999991 1.51939e-6 8 13 true true true
11 hsa-miR-30a-5p hsa-miR-30a-3p 0.999986 2.61404e-6 9 6 false true false
12 hsa-miR-200a-3p hsa-miR-200b-3p 0.999981 3.95989e-6 10 35 true true true
13 hsa-miR-2116-5p hsa-miR-2116-3p 0.999885 1.24838e-5 11 12 true true true
73 hsa-miR-641-5p hsa-miR-26a-2-3p 0.578209 0.218109 31 80 true true true
74 hsa-miR-1304-3p hsa-miR-619-5p 0.574428 0.220912 26 81 true true true
75 hsa-miR-3176-3p hsa-miR-143-3p 0.572199 0.223671 20 82 true true true
76 hsa-miR-96-5p hsa-miR-155-3p 0.57146 0.226367 2 83 true true true
77 hsa-miR-3176-3p hsa-miR-30b-5p 0.55876 0.229157 20 84 true true true
78 hsa-miR-3176-3p hsa-miR-4524a-5p 0.557952 0.231886 20 85 true true true
79 hsa-miR-1304-3p hsa-miR-4421-3p 0.556114 0.23457 26 86 true true true
80 hsa-miR-641-5p hsa-miR-629-5p 0.550412 0.237258 31 87 true true true
81 hsa-miR-641-5p hsa-let-7d-5p 0.541004 0.239995 31 88 true true true
82 hsa-miR-4482-1-3p hsa-miR-30a-3p 0.537326 0.242711 29 6 true true true
83 hsa-miR-3176-3p hsa-miR-138-1-3p 0.537207 0.245362 20 89 true true true
84 hsa-miR-193b-3p hsa-miR-181c-3p 0.528992 0.248049 18 90 true true true