using DrWatson
quickactivate(@__DIR__)
using DataFrames
using Arrow
using GLMakie
using GraphMakie
using BioFindrWARNING: using GraphMakie.Arrow in module Main conflicts with an existing identifier.
The posterior probabilities computed by the findr function can be used as a graph structure prior in Bayesian network learning using the model introduced in the paper High-dimensional Bayesian network inference from systems genetics data using genetic node ordering. This involves converting the pairwise posterior probabilities to a directed acyclic graph (DAG).
BioFindr implements the original “greedy edges” algorithm where edges are added one-by-one in decreasing order of probability, and only if they do not create a cycle in the graph, using an incremental cycle detection algorithm. Two additional algorithms from the paper Maximal acyclic subgraph optimization for gene regulatory networks are also implemented: the heuristic sort algorithm where vertices are sorted by their ratio of out-degree to in-degree, and edges are added only if their source vertex precedes their target vertex in the sorted list, and the greedy insertion algorithm where vertices are iteratively inserted in the position in the current ordering that yields the maximum possible gain of edge weights, where the gain is counted as the difference between the sum of new edge weights included and the sum of old edge weights lost, and edges are counted only if their source vertex precedes their target vertex in the ordering.
For illustration we use the GEUVADIS microRNA data:
We perform causal inference to compute posterior probabilities from all microRNAs with an eQTL to the total set of microRNAs:
| Row | Source | Target | Probability | qvalue |
|---|---|---|---|---|
| String | String | Float64 | Float64 | |
| 1 | hsa-miR-550a-2-5p | hsa-miR-550a-3-5p | 1.0 | 0.0 |
| 2 | hsa-miR-96-5p | hsa-miR-182-5p | 1.0 | 4.81948e-13 |
| 3 | hsa-miR-550a-2-5p | hsa-miR-550a-2-3p | 1.0 | 2.53784e-11 |
| 4 | hsa-miR-3130-1-5p | hsa-miR-3130-2-5p | 1.0 | 1.04578e-10 |
| 5 | hsa-miR-550a-2-3p | hsa-miR-550a-3-5p | 1.0 | 1.5224e-10 |
| 6 | hsa-miR-550a-2-3p | hsa-miR-550a-2-5p | 1.0 | 4.87261e-10 |
| 7 | hsa-miR-183-5p | hsa-miR-182-5p | 1.0 | 5.45614e-9 |
| 8 | hsa-miR-30a-3p | hsa-miR-30a-5p | 1.0 | 4.29665e-8 |
| 9 | hsa-miR-574-5p | hsa-miR-574-3p | 0.999994 | 6.6706e-7 |
| 10 | hsa-miR-335-5p | hsa-miR-335-3p | 0.999991 | 1.51939e-6 |
| 11 | hsa-miR-30a-5p | hsa-miR-30a-3p | 0.999986 | 2.61404e-6 |
| 12 | hsa-miR-200a-3p | hsa-miR-200b-3p | 0.999981 | 3.95989e-6 |
| 13 | hsa-miR-2116-5p | hsa-miR-2116-3p | 0.999885 | 1.24838e-5 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 73 | hsa-miR-641-5p | hsa-miR-26a-2-3p | 0.578209 | 0.218109 |
| 74 | hsa-miR-1304-3p | hsa-miR-619-5p | 0.574428 | 0.220912 |
| 75 | hsa-miR-3176-3p | hsa-miR-143-3p | 0.572199 | 0.223671 |
| 76 | hsa-miR-96-5p | hsa-miR-155-3p | 0.57146 | 0.226367 |
| 77 | hsa-miR-3176-3p | hsa-miR-30b-5p | 0.55876 | 0.229157 |
| 78 | hsa-miR-3176-3p | hsa-miR-4524a-5p | 0.557952 | 0.231886 |
| 79 | hsa-miR-1304-3p | hsa-miR-4421-3p | 0.556114 | 0.23457 |
| 80 | hsa-miR-641-5p | hsa-miR-629-5p | 0.550412 | 0.237258 |
| 81 | hsa-miR-641-5p | hsa-let-7d-5p | 0.541004 | 0.239995 |
| 82 | hsa-miR-4482-1-3p | hsa-miR-30a-3p | 0.537326 | 0.242711 |
| 83 | hsa-miR-3176-3p | hsa-miR-138-1-3p | 0.537207 | 0.245362 |
| 84 | hsa-miR-193b-3p | hsa-miR-181c-3p | 0.528992 | 0.248049 |
To construct a DAG from the DataFrame dP with posterior probabilities using the default “greedy edges” algorithm, run:
The ! in the dagfindr! function name indicates that the function modifies its input argument. We can see that dP indeed contains some new columns:
| Row | Source | Target | Probability | qvalue | Source_idx | Target_idx | inDAG_greedy_edges |
|---|---|---|---|---|---|---|---|
| String | String | Float64 | Float64 | Int64 | Int64 | Bool | |
| 1 | hsa-miR-550a-2-5p | hsa-miR-550a-3-5p | 1.0 | 0.0 | 1 | 34 | true |
| 2 | hsa-miR-96-5p | hsa-miR-182-5p | 1.0 | 4.81948e-13 | 2 | 16 | true |
| 3 | hsa-miR-550a-2-5p | hsa-miR-550a-2-3p | 1.0 | 2.53784e-11 | 1 | 4 | true |
| 4 | hsa-miR-3130-1-5p | hsa-miR-3130-2-5p | 1.0 | 1.04578e-10 | 3 | 15 | true |
| 5 | hsa-miR-550a-2-3p | hsa-miR-550a-3-5p | 1.0 | 1.5224e-10 | 4 | 34 | true |
| 6 | hsa-miR-550a-2-3p | hsa-miR-550a-2-5p | 1.0 | 4.87261e-10 | 4 | 1 | false |
| 7 | hsa-miR-183-5p | hsa-miR-182-5p | 1.0 | 5.45614e-9 | 5 | 16 | true |
| 8 | hsa-miR-30a-3p | hsa-miR-30a-5p | 1.0 | 4.29665e-8 | 6 | 9 | true |
| 9 | hsa-miR-574-5p | hsa-miR-574-3p | 0.999994 | 6.6706e-7 | 7 | 14 | true |
| 10 | hsa-miR-335-5p | hsa-miR-335-3p | 0.999991 | 1.51939e-6 | 8 | 13 | true |
| 11 | hsa-miR-30a-5p | hsa-miR-30a-3p | 0.999986 | 2.61404e-6 | 9 | 6 | false |
| 12 | hsa-miR-200a-3p | hsa-miR-200b-3p | 0.999981 | 3.95989e-6 | 10 | 35 | true |
| 13 | hsa-miR-2116-5p | hsa-miR-2116-3p | 0.999885 | 1.24838e-5 | 11 | 12 | true |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 73 | hsa-miR-641-5p | hsa-miR-26a-2-3p | 0.578209 | 0.218109 | 31 | 80 | true |
| 74 | hsa-miR-1304-3p | hsa-miR-619-5p | 0.574428 | 0.220912 | 26 | 81 | true |
| 75 | hsa-miR-3176-3p | hsa-miR-143-3p | 0.572199 | 0.223671 | 20 | 82 | true |
| 76 | hsa-miR-96-5p | hsa-miR-155-3p | 0.57146 | 0.226367 | 2 | 83 | true |
| 77 | hsa-miR-3176-3p | hsa-miR-30b-5p | 0.55876 | 0.229157 | 20 | 84 | true |
| 78 | hsa-miR-3176-3p | hsa-miR-4524a-5p | 0.557952 | 0.231886 | 20 | 85 | true |
| 79 | hsa-miR-1304-3p | hsa-miR-4421-3p | 0.556114 | 0.23457 | 26 | 86 | true |
| 80 | hsa-miR-641-5p | hsa-miR-629-5p | 0.550412 | 0.237258 | 31 | 87 | true |
| 81 | hsa-miR-641-5p | hsa-let-7d-5p | 0.541004 | 0.239995 | 31 | 88 | true |
| 82 | hsa-miR-4482-1-3p | hsa-miR-30a-3p | 0.537326 | 0.242711 | 29 | 6 | true |
| 83 | hsa-miR-3176-3p | hsa-miR-138-1-3p | 0.537207 | 0.245362 | 20 | 89 | true |
| 84 | hsa-miR-193b-3p | hsa-miR-181c-3p | 0.528992 | 0.248049 | 18 | 90 | true |
The Source_idx and Target_idx are numerical IDs for the Source and Target nodes, respectively, and the inDAG_greedy_edges indicates whether the edge represented by a row of dP in included in the output DAG G. The mapping from node names to IDs is also returned as a dictionary object name2idx.
Dict{Symbol, Int64} with 90 entries:
Symbol("hsa-let-7b-5p") => 25
Symbol("hsa-miR-143-3p") => 82
Symbol("hsa-miR-629-5p") => 87
Symbol("hsa-miR-625-5p") => 65
Symbol("hsa-miR-3667-5p") => 21
Symbol("hsa-miR-582-3p") => 67
Symbol("hsa-miR-103a-2-5p") => 71
Symbol("hsa-miR-1307-3p") => 30
Symbol("hsa-miR-29a-3p") => 54
Symbol("hsa-miR-500a-5p") => 75
Symbol("hsa-miR-150-3p") => 56
Symbol("hsa-miR-1908-3p") => 17
Symbol("hsa-miR-200b-3p") => 35
Symbol("hsa-miR-194-2-5p") => 52
Symbol("hsa-miR-200a-3p") => 10
Symbol("hsa-miR-221-5p") => 62
Symbol("hsa-miR-2116-3p") => 12
Symbol("hsa-miR-4482-1-3p") => 29
Symbol("hsa-miR-130b-3p") => 68
Symbol("hsa-miR-335-5p") => 8
Symbol("hsa-miR-128-2-3p") => 41
Symbol("hsa-miR-181c-3p") => 90
Symbol("hsa-miR-3150b-3p") => 32
Symbol("hsa-miR-1304-3p") => 26
Symbol("hsa-miR-642a-5p") => 33
⋮ => ⋮
The output G is a directed graph object from the Graphs package:
This is a fairly simple datastructure, which only supports numerical node IDs, hence the need to create the name2idx map. One useful thing one can do with a Graph object is to draw it:
For more details, see the documentation.
To run the dagfindr! function with the other DAG construction algorithms mentioned in the Introduction,
The results of these dagfindr! calls are added to dP allowing easy comparison of the methods:
| Row | Source | Target | Probability | qvalue | Source_idx | Target_idx | inDAG_greedy_edges | inDAG_heuristic_sort | inDAG_greedy_insertion |
|---|---|---|---|---|---|---|---|---|---|
| String | String | Float64 | Float64 | Int64 | Int64 | Bool | Bool | Bool | |
| 1 | hsa-miR-550a-2-5p | hsa-miR-550a-3-5p | 1.0 | 0.0 | 1 | 34 | true | true | true |
| 2 | hsa-miR-96-5p | hsa-miR-182-5p | 1.0 | 4.81948e-13 | 2 | 16 | true | false | true |
| 3 | hsa-miR-550a-2-5p | hsa-miR-550a-2-3p | 1.0 | 2.53784e-11 | 1 | 4 | true | true | true |
| 4 | hsa-miR-3130-1-5p | hsa-miR-3130-2-5p | 1.0 | 1.04578e-10 | 3 | 15 | true | true | true |
| 5 | hsa-miR-550a-2-3p | hsa-miR-550a-3-5p | 1.0 | 1.5224e-10 | 4 | 34 | true | true | true |
| 6 | hsa-miR-550a-2-3p | hsa-miR-550a-2-5p | 1.0 | 4.87261e-10 | 4 | 1 | false | false | false |
| 7 | hsa-miR-183-5p | hsa-miR-182-5p | 1.0 | 5.45614e-9 | 5 | 16 | true | true | true |
| 8 | hsa-miR-30a-3p | hsa-miR-30a-5p | 1.0 | 4.29665e-8 | 6 | 9 | true | false | true |
| 9 | hsa-miR-574-5p | hsa-miR-574-3p | 0.999994 | 6.6706e-7 | 7 | 14 | true | true | true |
| 10 | hsa-miR-335-5p | hsa-miR-335-3p | 0.999991 | 1.51939e-6 | 8 | 13 | true | true | true |
| 11 | hsa-miR-30a-5p | hsa-miR-30a-3p | 0.999986 | 2.61404e-6 | 9 | 6 | false | true | false |
| 12 | hsa-miR-200a-3p | hsa-miR-200b-3p | 0.999981 | 3.95989e-6 | 10 | 35 | true | true | true |
| 13 | hsa-miR-2116-5p | hsa-miR-2116-3p | 0.999885 | 1.24838e-5 | 11 | 12 | true | true | true |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 73 | hsa-miR-641-5p | hsa-miR-26a-2-3p | 0.578209 | 0.218109 | 31 | 80 | true | true | true |
| 74 | hsa-miR-1304-3p | hsa-miR-619-5p | 0.574428 | 0.220912 | 26 | 81 | true | true | true |
| 75 | hsa-miR-3176-3p | hsa-miR-143-3p | 0.572199 | 0.223671 | 20 | 82 | true | true | true |
| 76 | hsa-miR-96-5p | hsa-miR-155-3p | 0.57146 | 0.226367 | 2 | 83 | true | true | true |
| 77 | hsa-miR-3176-3p | hsa-miR-30b-5p | 0.55876 | 0.229157 | 20 | 84 | true | true | true |
| 78 | hsa-miR-3176-3p | hsa-miR-4524a-5p | 0.557952 | 0.231886 | 20 | 85 | true | true | true |
| 79 | hsa-miR-1304-3p | hsa-miR-4421-3p | 0.556114 | 0.23457 | 26 | 86 | true | true | true |
| 80 | hsa-miR-641-5p | hsa-miR-629-5p | 0.550412 | 0.237258 | 31 | 87 | true | true | true |
| 81 | hsa-miR-641-5p | hsa-let-7d-5p | 0.541004 | 0.239995 | 31 | 88 | true | true | true |
| 82 | hsa-miR-4482-1-3p | hsa-miR-30a-3p | 0.537326 | 0.242711 | 29 | 6 | true | true | true |
| 83 | hsa-miR-3176-3p | hsa-miR-138-1-3p | 0.537207 | 0.245362 | 20 | 89 | true | true | true |
| 84 | hsa-miR-193b-3p | hsa-miR-181c-3p | 0.528992 | 0.248049 | 18 | 90 | true | true | true |