6 Calculation of the TF activity
Once we define the target genes of TFs by constructing the GRN, we can calculate the per cell TF activity by summing up the expression of the individual target genes. While it is reasonable to assume equal contribution from each target genes, epiregulon tries to estimate the weights of each target gene. Biologically, this can be interpreted as the magnitude of gene expression changes induced by transcription factor activity.
6.1 Add weights
Epiregulon estimates the regulatory potential using one of the three measures:
- Wilcoxon test statistics of target gene expression in cells jointly expressing all 3 elements vs. cells that do not.
- Correlation between TG and TF, or between TG and the product of TF and RE.
- Mutual information between TG and TF expression, or between TG and the product of TF and RE.
Wilcoxon and correlation methods give both the magnitude and directionality of changes whereas mutual information is always positive. The correlation and mutual information statistics are computed on grouped pseudobulks following user-supplied cluster labels and yield a single weight across all clusters per each TF-RE-target triplet. In contrast, the Wilcoxon method can give a single weight across all cells or give cluster-specific weights if the users provide cluster labels. In general, we recommend computing a single weight across all cells because the data in totality provides the variability of target gene expression necessary for weights estimation. For example, we want to include both untreated and treated cells to gauge the effects of the treatment. Cluster-specific weights can be useful for inferring differential network connectivity (See section on Differential Network Analysis).
The choice of weights method can have a profound effect on the estimate of TF activity. When the TF activity is directly correlated with TF expression, all methods tend to work quite well. However, there are scenarios such as drug treatment or CRISPR knockout when TF continues to be expressed despite post-translational inhibition of TF activity. Epiregulon was designed to handle scenarios where TF expression is decoupled from TF activity. In such cases, wilcox is the preferred choice. For a specific use case, see an example on comparing "wilcox" and "corr" methods. Refer to our manuscript for a more in-depth discussion of the biological scenarios compatible with each weight method.
regulon.w <- addWeights(regulon = pruned.regulon.filtered,
expMatrix = GeneExpressionMatrix,
exp_assay = "normalizedCounts",
peakMatrix = PeakMatrix,
peak_assay = "counts",
method = "wilcox",
clusters = GeneExpressionMatrix$cell_type,
useDim = "LSI_RNA")
regulon.w## DataFrame with 1374834 rows and 49 columns
## idxATAC chr start end idxRNA target corr p_val_peak_gene FDR_peak_gene distance idxTF
## <integer> <character> <integer> <integer> <integer> <array> <matrix> <matrix> <matrix> <integer> <integer>
## 1 266 chr1 1686177 1686677 86 NADK 0.789720 0.01973630 0.502977 91791 1030
## 2 991 chr1 8947075 8947575 222 CA6 0.863810 0.00859085 0.470017 1207 1030
## 3 1204 chr1 10208525 10209025 248 KIF1B 0.752412 0.02785332 0.525093 1778 1030
## 4 1607 chr1 15296927 15297427 351 EFHD2 0.724069 0.03562011 0.550436 112459 1030
## 5 2158 chr1 21022229 21022729 476 EIF4G3 0.755471 0.02721467 0.524680 28263 1030
## ... ... ... ... ... ... ... ... ... ... ... ...
## 1374830 155544 chr22 50524937 50525437 23714 TYMP -0.632002 0.0165565 0.496846 4557 1558
## 1374831 155577 chr22 50582163 50582663 23713 SCO2 -0.548795 0.0391178 0.561465 56559 1558
## 1374832 155577 chr22 50582163 50582663 23714 TYMP -0.560573 0.0348815 0.547340 52167 1558
## 1374833 155577 chr22 50582163 50582663 23715 ODF3B -0.549052 0.0390983 0.561465 50086 1558
## 1374834 157674 chrX 71532678 71533178 35821 OGT 0.694312 0.0457767 0.579826 0 1558
## tf pval stats qval CD14..Mono.vs.rest.FDR CD14..Mono.vs.rest.p.value CD14..Mono.vs.rest.logFC
## <character> <matrix> <matrix> <matrix> <numeric> <numeric> <numeric>
## 1 ADNP 2.14101e-07 26.90142 1.00000e+00 1.12227e-273 3.58197e-275 0.453974
## 2 ADNP 2.09927e-03 9.46065 1.00000e+00 1.07642e-91 1.13527e-92 -0.381033
## 3 ADNP 3.18769e-21 89.42324 2.88367e-14 7.65618e-310 2.01501e-311 0.575687
## 4 ADNP 4.50461e-03 8.06831 1.00000e+00 0.00000e+00 0.00000e+00 0.579157
## 5 ADNP 8.62498e-05 15.41611 1.00000e+00 0.00000e+00 0.00000e+00 1.127554
## ... ... ... ... ... ... ... ...
## 1374830 ZXDC 0.019343447 5.47020 1 0.00000e+00 0.00000e+00 2.369751
## 1374831 ZXDC 0.030612624 4.67456 1 0.00000e+00 0.00000e+00 0.703607
## 1374832 ZXDC 0.000448926 12.31669 1 0.00000e+00 0.00000e+00 2.369751
## 1374833 ZXDC 0.006005115 7.54877 1 2.70505e-207 1.26648e-208 0.355839
## 1374834 ZXDC 0.010115322 6.61447 1 1.37044e-267 4.55459e-269 0.443879
## Memory.CD4..T.vs.rest.FDR Memory.CD4..T.vs.rest.p.value Memory.CD4..T.vs.rest.logFC B.vs.rest.FDR B.vs.rest.p.value
## <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 2.05315e-154 6.44038e-156 -0.375422 1.86039e-71 7.26021e-73
## 2 4.84933e-83 3.37369e-84 -0.366194 2.38300e-91 7.00421e-93
## 3 5.24986e-87 3.43190e-88 -0.374737 2.15099e-133 3.81343e-135
## 4 2.68684e-150 8.80424e-152 -0.431623 4.46147e-126 8.60754e-128
## 5 1.58185e-203 3.28630e-205 -0.769835 9.85829e-110 2.29967e-111
## ... ... ... ... ... ...
## 1374830 0.00000e+00 0.00000e+00 -2.246853 0.00000e+00 0.00000e+00
## 1374831 0.00000e+00 0.00000e+00 -0.669777 1.26531e-315 4.89624e-318
## 1374832 0.00000e+00 0.00000e+00 -2.246853 0.00000e+00 0.00000e+00
## 1374833 7.68600e-155 2.39621e-156 -0.323648 2.16867e-129 4.01738e-131
## 1374834 5.86017e-169 1.62112e-170 0.553519 3.41241e-56 1.79433e-57
## B.vs.rest.logFC Monocytes.vs.rest.FDR Monocytes.vs.rest.p.value Monocytes.vs.rest.logFC Naive.CD8..T.vs.rest.FDR
## <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 -0.336361 1.61672e-15 1.49878e-16 0.310028 1.69311e-155
## 2 -0.381502 7.44419e-91 4.10638e-93 -0.381965 1.88762e-92
## 3 -0.484800 1.52032e-18 1.15574e-19 0.396732 2.67629e-197
## 4 -0.461085 1.51226e-45 3.07532e-47 0.961398 4.65608e-199
## 5 0.950493 6.72774e-52 1.06534e-53 1.283433 1.15679e-224
## ... ... ... ... ... ...
## 1374830 -2.137413 5.62567e-83 3.56642e-85 1.673328 0.00000e+00
## 1374831 -0.691577 5.05831e-18 3.97441e-19 0.434858 0.00000e+00
## 1374832 -2.137413 5.62567e-83 3.56642e-85 1.673328 0.00000e+00
## 1374833 -0.324067 2.75961e-13 3.00060e-14 0.252888 2.29027e-184
## 1374834 0.407450 6.94934e-24 3.81243e-25 0.458222 3.58531e-155
## Naive.CD8..T.vs.rest.p.value Naive.CD8..T.vs.rest.logFC FCGR3A..Mono.vs.rest.FDR FCGR3A..Mono.vs.rest.p.value
## <numeric> <numeric> <numeric> <numeric>
## 1 4.84635e-157 -0.381930 1.28738e-75 3.94290e-77
## 2 1.02675e-93 0.381965 1.85410e-91 4.10631e-93
## 3 5.40575e-199 -0.502247 4.51660e-48 2.62284e-49
## 4 9.32800e-201 -0.488863 3.23963e-108 5.18334e-110
## 5 1.98101e-226 -0.822841 2.28494e-101 4.15125e-103
## ... ... ... ... ...
## 1374830 0.00000e+00 -2.369751 4.02540e-319 4.74303e-322
## 1374831 0.00000e+00 -0.697245 1.10843e-54 5.33866e-56
## 1374832 0.00000e+00 -2.369751 4.02540e-319 4.74303e-322
## 1374833 5.13517e-186 -0.343186 2.39081e-31 2.25381e-32
## 1374834 1.03118e-156 0.600549 4.86603e-46 2.96732e-47
## FCGR3A..Mono.vs.rest.logFC DC.vs.rest.FDR DC.vs.rest.p.value DC.vs.rest.logFC Naive.CD4..T.vs.rest.FDR
## <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 0.613397 1.48721e-10 1.38321e-11 0.467425 4.93457e-177
## 2 -0.381965 5.36314e-91 4.10647e-93 -0.381965 1.74410e-35
## 3 0.464689 1.15367e-05 1.85408e-06 0.325063 2.32559e-185
## 4 0.912985 6.92276e-55 1.00693e-56 -1.224596 5.94247e-203
## 5 -0.752391 3.09698e-24 1.18820e-25 1.224873 2.27925e-259
## ... ... ... ... ... ...
## 1374830 2.407016 9.94254e-52 1.54440e-53 -2.022347 0.00000e+00
## 1374831 0.529519 3.60704e-33 9.35467e-35 -0.635970 0.00000e+00
## 1374832 2.407016 9.94254e-52 1.54440e-53 -2.022347 0.00000e+00
## 1374833 0.272149 9.20085e-16 5.72181e-17 -0.281375 9.44166e-179
## 1374834 0.405742 1.55880e-26 5.31323e-28 1.288327 3.85455e-118
## Naive.CD4..T.vs.rest.p.value Naive.CD4..T.vs.rest.logFC Memory.CD8..T.vs.rest.FDR Memory.CD8..T.vs.rest.p.value
## <numeric> <numeric> <numeric> <numeric>
## 1 9.94010e-179 -0.402145 4.83824e-93 1.51369e-94
## 2 2.12424e-36 -0.265716 2.07278e-90 6.76934e-92
## 3 4.42934e-187 -0.499501 5.15609e-58 2.82016e-59
## 4 1.00949e-204 -0.498699 2.89359e-93 9.00523e-95
## 5 2.86486e-261 -0.883351 3.34172e-94 1.02531e-95
## ... ... ... ... ...
## 1374830 0.00000e+00 -2.346248 0.00000e+00 0.00000e+00
## 1374831 0.00000e+00 -0.703607 1.78609e-274 1.09308e-276
## 1374832 0.00000e+00 -2.346248 0.00000e+00 0.00000e+00
## 1374833 1.87859e-180 -0.341559 1.03392e-165 1.46415e-167
## 1374834 1.36250e-119 0.518676 6.48018e-81 2.40619e-82
## Memory.CD8..T.vs.rest.logFC NK.vs.rest.FDR NK.vs.rest.p.value NK.vs.rest.logFC NA.vs.rest.FDR NA.vs.rest.p.value
## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 -0.358226 1.19826e-65 3.53184e-67 -0.355368 1.34418e-273 3.58197e-275
## 2 -0.380283 4.31779e-89 8.81618e-91 -0.379363 4.29660e-92 4.10700e-93
## 3 -0.379189 7.20212e-67 2.07735e-68 -0.450352 9.36515e-310 2.01501e-311
## 4 0.678123 2.95876e-118 4.05187e-120 1.421655 0.00000e+00 0.00000e+00
## 5 0.766010 6.52472e-70 1.79959e-71 0.967395 0.00000e+00 0.00000e+00
## ... ... ... ... ... ... ...
## 1374830 -2.315715 2.18585e-267 7.37856e-270 -2.218151 4.85186e-06 1.65084e-06
## 1374831 -0.669459 9.48323e-192 5.85577e-194 -0.665203 3.69027e-01 1.89547e-01
## 1374832 -2.315715 2.18585e-267 7.37856e-270 -2.218151 4.85186e-06 1.65084e-06
## 1374833 -0.338003 1.38920e-89 2.80981e-91 -0.321178 3.12869e-207 1.26648e-208
## 1374834 0.569832 2.21855e-40 1.17083e-41 0.527130 1.63992e-267 4.55459e-269
## NA.vs.rest.logFC weight
## <numeric> <matrix>
## 1 -0.453974 0.0583992: 0.00000000:-0.00307422:...
## 2 -0.381965 0.0311224:-0.00137646:-0.00249145:...
## 3 -0.575687 0.1032210: 0.01156464: 0.03634468:...
## 4 -0.579157 0.0333821: 0.00000000: 0.00137167:...
## 5 -1.127554 0.0550255:-0.00297161: 0.03414523:...
## ... ... ...
## 1374830 -2.121123 0.0258094: 0.00679141:-0.00776715:...
## 1374831 -0.531398 0.0262938:-0.01485703: 0.00423260:...
## 1374832 -2.121123 0.0532873:-0.00953433: 0.00931223:...
## 1374833 -0.355839 0.0311489:-0.05049477: 0.00802140:...
## 1374834 -0.443879 0.0256298: 0.04065795: 0.01652375:...
6.1.1 Aggregate cells
As with the pruneRegulon function, cells can be aggregated into metacells using the aggregateCells argument in addWeights. Use this option only with the Wilcoxon method, since the other two methods already incorporate cell aggregation. By default, the function generates aggregates containing an average of 10 cells. However, the exact number may vary across metacells, depending on how single cells are distributed in the reduced dimensionality space. Refer to the cell aggregation section in Regulation Refinement for additional information.
regulon.w.aggr <- addWeights(regulon = pruned.regulon,
expMatrix = GeneExpressionMatrix,
exp_assay = "normalizedCounts",
peakMatrix = PeakMatrix,
peak_assay = "counts",
method = "wilcox",
clusters = GeneExpressionMatrix$cell_type,
aggregateCells = TRUE,
useDim = "LSI_RNA",
exp_cutoff=NULL,
peak_cutoff=NULL)6.1.2 Using chromatin accessibility data
The corr and MI methods for calculating weights by default ignore information about chromatin accessibility. However, it is possible to incorporate this modality into the calculations by setting the tf_re.merge argument to TRUE. This adds an additional step to the procedure, in which transcription factor expression is multiplied by chromatin accessibility. Therefore, expression is reduced to zero if the corresponding element is not accessible. The TF-RE product is calculated at the level of aggregated cells, mitigating the effect of sparsity, which affects ATAC-seq data more than gene expression. See an example using the tf_re.merge argument.
6.2 Calculate TF activity
Finally, the activity for a specific TF in each cell is computed by averaging the expression of target genes weighted by the test statistics of choice as explained in the add weights section. \[y=\frac{1}{n}\sum_{i=1}^{n} x_i * weights_i\] where \(y\) is the activity of a TF for a cell, \(n\) is the total number of targets for a TF, \(x_i\) is the log count expression of target \(i\) where \(i\) in {1,2,…,n} and \(weights_i\) is the weight of TF - target \(i\)
score.combine <- calculateActivity(expMatrix = GeneExpressionMatrix,
regulon = regulon.w,
mode = "weight",
exp_assay = "normalizedCounts")
score.combine[1:5,1:5]## 5 x 5 sparse Matrix of class "dgCMatrix"
## PBMC_10k#GGTTGCATCCTGGCTT-1 PBMC_10k#GGTTGCGGTAAACAAG-1 PBMC_10k#TGTTCCTCATAAGTTC-1 PBMC_10k#CGACTAAGTAACGGGA-1
## ADNP 0.19163136 0.1479988 0.1897086 0.11159201
## AEBP2 0.07737805 0.1577963 0.1498887 0.18522299
## AFF1 0.16854888 0.1645562 0.2024166 0.11041418
## AFF4 0.20493204 0.1909851 0.2443615 0.12750381
## AGO1 0.15670300 0.1384629 0.1603721 0.09831584
## PBMC_10k#CTGCTCCCAAGGTCCT-1
## ADNP 0.2122728
## AEBP2 0.1830738
## AFF1 0.2081108
## AFF4 0.2504034
## AGO1 0.1595813