using DrWatson
quickactivate(@__DIR__)
using DataFrames
using Arrow
using BioFindrMatrix-based data
Introduction
Internally, all BioFindr functions work with matrices or array-based data, and the DataFrame based findr methods used in the coexpression analysis, association analysis, and causal inference tutorials are wrapper functions provided for convenience. If you prefer matrix-based data over DataFrames, you can directly use matrix-based findr methods without having to create DataFrames first.
Set up the environment
Load data
Let’s pretend our GEUVADIS data is in a matrix-based format:
Xt = Matrix(DataFrame(Arrow.Table(datadir("exp_pro","findr-data-geuvadis", "dt.arrow"))));
Xm = Matrix(DataFrame(Arrow.Table(datadir("exp_pro","findr-data-geuvadis", "dm.arrow"))));
Gm = Matrix(DataFrame(Arrow.Table(datadir("exp_pro","findr-data-geuvadis", "dgm.arrow"))));We also need the microRNA eQTL mapping (see the causal inference tutorial), in this case in the form of an array where each row corresponds to a cis-eQTL/eGene pair represented by of a column index of Gm (i.e. a SNP) and a column index of Xm (i.e. a microRNA). Recall that due to the preprocessing of the findr-geuvadis data. the column indices are identical, but this will not be the case in general:
mirpairs = zeros(Int32,size(Gm,2),2);
for k=1:size(mirpairs,1)
mirpairs[k,:] = [k k]
endNote that data must be stored in matrices where columns correspond to variables (genes, SNPs, etc.) and rows correspond to observations (samples).
Run BioFindr
Below, we only show the relevant findr commands. Check the corresponding tutorials and BioFindr documentation for more details.
Coexpression analysis
All-vs-all
Coexpression analysis on a single matrix returns a square matrix with dimensions equal to the number of variables (columns) in the input matrix:
P = findr(Xm)674×674 Matrix{Float64}:
1.0 0.0453818 0.225592 … 0.0332217 0.107904 0.105772
0.0889502 1.0 0.102236 0.0474569 0.731827 0.248213
0.1729 0.0403721 1.0 0.777618 0.106922 0.1002
0.0159859 0.106602 0.0893431 0.0699576 0.098629 0.122935
0.120573 0.112145 0.238959 0.0346163 0.164631 0.297973
0.0624062 0.0586858 0.0694624 … 0.569015 0.104281 0.186142
0.423979 0.0332279 0.0971337 0.0435612 0.124816 0.094099
0.0533054 0.0266426 0.559493 0.0550043 0.34664 0.0981003
0.114737 0.0143626 0.1419 0.0439354 0.192045 0.131555
0.176766 0.48804 1.0 0.0337898 0.213006 0.133968
0.360143 0.153889 0.425408 … 0.0332874 0.163973 0.187538
0.182787 0.0493317 0.0386118 0.0538748 0.105971 0.160988
0.676394 0.0264916 0.444864 0.11139 0.281354 0.0970358
⋮ ⋱
0.0763606 0.0436318 0.868105 0.0379098 0.0992675 0.157452
0.190505 0.137327 0.113494 0.0334964 0.379698 0.516604
0.378913 0.00636714 0.969245 0.0333967 0.218768 0.0972078
0.0925918 0.00949055 0.11976 … 0.044673 0.111653 0.0953303
0.1283 0.0292623 0.789323 0.0366376 0.190591 0.0937124
0.115916 0.0566189 0.0567839 0.0330524 0.243382 0.123887
0.0371264 0.0516759 0.177314 0.0620753 0.111248 0.817595
0.286305 0.132744 0.106446 0.0361477 0.112938 0.152742
0.162885 0.178503 0.105606 … 0.999703 0.116572 0.0951264
0.0429484 0.123023 0.905047 1.0 0.122408 0.912701
0.0920713 0.715456 0.108091 0.0425472 1.0 1.0
0.0982375 0.312023 0.0995635 0.7971 1.0 1.0
In the output, columns correspond to A-genes (causal factors) and rows to B-genes (targets), that is:
\[ P_{i,j} = P(X_j \to X_i) \]
Note that the diagonal is arbitrarily set to one, BioFindr cannot make any inferences about the presence or absence of self-regulation!
Bipartite
Analyse coexpression from a subset of variables to the whole set:
P = findr(Xm; cols=[1,3,7,50])674×4 Matrix{Float64}:
1.0 0.225592 0.346674 0.245078
0.0889502 0.102236 0.113262 0.162557
0.1729 1.0 0.11442 0.073312
0.0159859 0.0893431 0.146108 0.473624
0.120573 0.238959 0.239501 0.0620736
0.0624062 0.0694624 0.0989481 0.0511453
0.423979 0.0971337 1.0 0.203824
0.0533054 0.559493 0.201162 0.124401
0.114737 0.1419 0.0934371 0.23661
0.176766 1.0 0.133559 0.116917
0.360143 0.425408 0.114545 0.288854
0.182787 0.0386118 0.123711 0.146996
0.676394 0.444864 0.135674 0.260814
⋮
0.0763606 0.868105 0.265134 0.513929
0.190505 0.113494 0.260401 0.114172
0.378913 0.969245 0.213968 0.130775
0.0925918 0.11976 0.197852 0.32143
0.1283 0.789323 0.132733 0.283125
0.115916 0.0567839 0.513517 0.239496
0.0371264 0.177314 0.140021 0.298428
0.286305 0.106446 0.158433 0.39212
0.162885 0.105606 0.104283 0.347775
0.0429484 0.905047 0.139504 0.0675872
0.0920713 0.108091 0.139412 0.0202383
0.0982375 0.0995635 0.0917764 0.375652
Analyse coexpression from the variables in Xm to the variables in Xt:
P = findr(Xt,Xm)23722×674 Matrix{Float64}:
0.0234184 0.00291162 0.732244 … 0.156543 0.0361393 0.0764542
0.0221085 0.00435185 0.472921 0.306938 0.0698138 0.125194
0.0241246 0.00209359 0.351981 0.0358197 0.0646466 0.0804391
0.532158 0.00245132 0.616879 0.00249436 0.0958314 0.0656744
0.0222376 0.00256429 0.0337209 0.0317953 0.0874541 0.691511
0.0311296 0.00405144 0.282852 … 0.0512983 0.0662781 0.841356
0.0212987 0.00689882 0.300301 0.226043 0.0351949 0.1855
0.0590008 0.00292483 0.146968 0.0127612 0.0549558 0.0626004
0.0217252 0.00209742 0.276152 0.237079 0.0438237 0.361715
0.0226942 0.00344102 0.00199485 0.0140767 0.0350283 0.267969
0.0213827 0.00532704 0.455821 … 0.197865 0.0368173 0.0158116
0.0343095 0.00236632 0.534035 0.429556 0.0311012 0.0787474
0.0214705 0.00207517 0.520825 0.239688 0.609959 0.99285
⋮ ⋱
0.063287 0.0026124 0.955061 … 0.983501 0.103916 0.353626
0.0230399 0.0200878 0.14564 0.00081434 0.0299894 0.152551
0.0244435 0.00362406 0.230516 0.679986 0.0488017 0.452377
0.0439679 0.00265316 0.0162124 0.113298 0.0386101 0.0763466
0.0481255 0.0152828 0.10255 0.227967 0.069969 0.323395
0.0219638 0.0025475 0.129958 … 0.797988 0.0437969 0.235947
0.0359421 0.00217451 0.0695795 0.0309512 0.0536083 0.0518225
0.0228804 0.00371468 0.122758 0.404031 0.0544299 0.175212
0.0214806 0.00207504 0.000637106 0.266088 0.0540143 0.153354
0.0225533 0.00213282 0.0144905 0.0383339 0.0452756 0.0636567
0.0285356 0.00291061 0.0484913 … 0.044657 0.0463433 0.235425
0.0628126 0.0021249 0.0326125 0.00878032 0.117111 0.42914
Association analysis
Testing associations between eQTL genotypes in Gmand microRNA expression levels in Xm:
P = findr(Xm,Gm)674×55 Matrix{Float64}:
0.99709 0.336726 0.0 0.0 … 0.000455842 0.0 0.00194948
0.076501 0.999976 0.0 0.0 0.000743437 0.0 0.00511882
0.0534877 0.199234 1.0 0.0 0.000448411 0.0 0.00836276
0.0231871 0.441697 0.0 1.0 0.000418771 0.0 0.00229785
0.0359331 0.418019 0.0 0.0 0.000657638 0.0 0.00169117
0.0434282 0.35389 0.0 0.0 … 0.00166157 0.0 0.00197825
0.0243615 0.370613 0.0 0.0 0.000723086 0.0 1.0
0.0244173 0.314934 0.0 0.0 0.00219063 0.0 0.0108873
0.0357465 0.209157 0.0 0.0 0.00064661 0.0 0.00470863
0.110227 0.376232 0.999825 0.0 0.00167592 0.0 0.00316303
0.0281279 0.415582 0.0 0.0 … 0.000939568 0.0 0.00779912
0.027045 0.274934 0.0 0.0 0.00101715 0.0 0.00163995
0.0603348 0.366114 0.0 0.0 0.000360567 0.0 0.00454491
⋮ ⋱
0.0374556 0.322883 0.0 0.0 0.000363842 0.0 0.00862389
0.02492 0.227238 0.0 0.0 0.000602233 0.0 0.00375107
0.0311861 0.219238 0.0 0.0 0.00039574 0.0 0.00327624
0.0795592 0.231186 0.0 0.0 … 0.00140163 0.0 0.00415341
0.046263 0.215004 0.0 0.0 0.000416495 0.0 0.00505554
0.0303016 0.539201 0.0 0.0 0.008878 0.0 0.00257078
0.0548847 0.546104 0.0 0.0 0.000603634 0.0 0.00463545
0.0267455 0.511972 0.0 0.0 0.000423386 0.0 0.00158495
0.0463963 0.557115 0.0 0.0 … 0.00110603 0.0 0.00157338
0.105566 0.235344 0.0 0.0 0.00102074 0.0 0.00651383
0.0220181 0.219283 0.0 0.0 0.000538151 0.0 0.00476893
0.0307281 0.270128 0.0 0.0 0.000459762 0.0 0.0735222
In the output, columns correspond to eQTLs and rows to genes, that is,
\[ P_{i,j} = P(E_j \to X_i) \]
Causal inference
Subset-to-all
When you run causal inference with findr using matrix-based inputs, the default is to return posterior probabilities for each test separately:
P = findr(Xm,Gm,mirpairs);Note the dimensions of P:
size(P)(674, 4, 55)
The third dimension indexes the A-genes (causes), the second dimension the tests (test 2-5, see link above), and the first the B-genes (targets). If you are interested only in a specific combination, use the optional combination argument as explained in the causal inference tutorial:
P = findr(Xm,Gm,mirpairs; combination="IV");