Bayesian network learning
BioFindr implements the method described in the paper High-dimensional bayesian network inference from systems genetics data using genetic node ordering[Wang2019] to learn a Bayesian network using a dataframe of posterior probabilities as prior edge weights.
DAG reconstruction
The first step in the algorithm is to convert a dataframe of posterior probabilities to a directed acyclic graph (DAG). [BioFindr][1] implements the original greedy algorithm where edges are added one by one in descending order of posterior probability and edges that would introduce a cycle in the dagfindr_greedy_edges! function. Two additional methods dagfindr_greedy_insertion! and dagfindr_heuristic_sort! developed by Kenneth Stoop and Pieter Audenaert in this paper are also implemented. The dagfindr! is the main user interface function.
BioFindr.dagfindr! — Functiondagfindr!(dP::T; method="greedy edges") where T<:AbstractDataFrameConvert a DataFrame dP of findr results (list of edges) to a directed acyclic graph (DAG) using the specified method. The output is a directed graph represented as a SimpleDiGraph from the Graphs package. The method can be any of
"greedy edges"(default), seedagfindr_greedy_edges!."heuristic sort", seedagfindr_heuristic_sort!."greedy insertion", seedagfindr_greedy_insertion!.
BioFindr.dagfindr_greedy_edges! — Functiondagfindr_greedy_edges!(dP::T) where T<:AbstractDataFrameConvert a DataFrame of dP of findr results (list of edges) to a directed acyclic graph (DAG) represented as a SimpleDiGraph from the Graphs package. This function implements the method of Wang et al. (2019) where edges are added one-by-one in decreasing order of probability, and only if they do not create a cycle in the graph, using the incremental cycle detection algorithm from the Graphs package.
BioFindr.dagfindr_heuristic_sort! — Functiondagfindr_heuristic_sort!(dP::T) where T<:AbstractDataFrameConvert a DataFrame of dP of findr results (list of edges) to a directed acyclic graph (DAG) represented as a SimpleDiGraph from the Graphs package. This function implements the heuristic sort method of Stoop et al. (2023) where vertices are sorted by their ratio of out-degree to in-degree, and edges are added only if their source vertex precedes their target vertex in the sorted list. The output is a directed graph and a dictionary to map vertex names to numbers.
BioFindr.dagfindr_greedy_insertion! — Functiondagfindr_greedy_insertion!(dP::T) where T<:AbstractDataFrameConvert a DataFrame of dP of findr results (list of edges) to a directed acyclic graph (DAG) represented as a SimpleDiGraph from the Graphs package. This function implements the greedy insertion method of Stoop et al. (2023) where vertices are sorted iteratively by inserting vertices in the position in the current ordering yields the maximum possible gain of edge weights, where the gain is counted as the difference between the sum of new edges weight included and the sum of old edge weights lost, where edges are counted only if their source vertex precedes their target vertex in the ordering. The output is a directed graph and a dictionary to map vertex names to numbers.
BioFindr.greedy_insertions! — Functiongreedy_insertions!(sorted_vertices, weights)TBW
BioFindr.edge_weights — Functionedge_weights(dP::T) where T<:AbstractDataFrameTBW
BioFindr.names_to_index! — Functionnames_to_index!(dP::T) where T<:AbstractDataFrameAdd columns with vertex numbers to a DataFrame dP of edges. The columns Source_idx and Target_idx are added to dP with the vertex numbers corresponding to the names in the Source and Target columns, respectively. The function returns a dictionary name2num to map vertex names to numbers.
- Wang2019Wang L, Audenaert P, Michoel T (2019) High-dimensional bayesian network inference from systems genetics data using genetic node ordering. Frontiers in Genetics, Special Topic Machine Learning and Network-Driven Integrative Genomics, 10, 1196.