6 Calculation of the TF activity

Once we define the target genes of TFs by constructing the GRN, we can calculate the per cell TF activity by summing up the expression of the individual target genes. While it is reasonable to assume equal contribution from each target genes, epiregulon tries to estimate the weights of each target gene. Biologically, this can be interpreted as the magnitude of gene expression changes induced by transcription factor activity.

6.1 Add weights

Epiregulon estimates the regulatory potential using one of the three measures:

  1. Wilcoxon test statistics of target gene expression in cells jointly expressing all 3 elements vs. cells that do not.
  2. Correlation between TG and TF, or between TG and the product of TF and RE.
  3. Mutual information between TG and TF expression, or between TG and the product of TF and RE.

Wilcoxon and correlation methods give both the magnitude and directionality of changes whereas mutual information is always positive. The correlation and mutual information statistics are computed on grouped pseudobulks following user-supplied cluster labels and yield a single weight across all clusters per each TF-RE-target triplet. In contrast, the Wilcoxon method can give a single weight across all cells or give cluster-specific weights if the users provide cluster labels. In general, we recommend computing a single weight across all cells because the data in totality provides the variability of target gene expression necessary for weights estimation. For example, we want to include both untreated and treated cells to gauge the effects of the treatment. Cluster-specific weights can be useful for inferring differential network connectivity (See section on Differential Network Analysis).

The choice of weights method can have a profound effect on the estimate of TF activity. When the TF activity is directly correlated with TF expression, all methods tend to work quite well. However, there are scenarios such as drug treatment or CRISPR knockout when TF continues to be expressed despite post-translational inhibition of TF activity. Epiregulon was designed to handle scenarios where TF expression is decoupled from TF activity. In such cases, wilcox is the preferred choice. For a specific use case, see an example on comparing "wilcox" and "corr" methods. Refer to our manuscript for a more in-depth discussion of the biological scenarios compatible with each weight method.

regulon.w <- addWeights(regulon = pruned.regulon.filtered,
                        expMatrix  = GeneExpressionMatrix,
                        exp_assay  = "normalizedCounts",
                        peakMatrix = PeakMatrix,
                        peak_assay = "counts",
                        method = "wilcox",
                        clusters = GeneExpressionMatrix$cell_type,
                        useDim = "LSI_RNA")
regulon.w
## DataFrame with 1374834 rows and 49 columns
##           idxATAC         chr     start       end    idxRNA  target      corr p_val_peak_gene FDR_peak_gene  distance     idxTF
##         <integer> <character> <integer> <integer> <integer> <array>  <matrix>        <matrix>      <matrix> <integer> <integer>
## 1             266        chr1   1686177   1686677        86    NADK  0.789720      0.01973630      0.502977     91791      1030
## 2             991        chr1   8947075   8947575       222     CA6  0.863810      0.00859085      0.470017      1207      1030
## 3            1204        chr1  10208525  10209025       248   KIF1B  0.752412      0.02785332      0.525093      1778      1030
## 4            1607        chr1  15296927  15297427       351   EFHD2  0.724069      0.03562011      0.550436    112459      1030
## 5            2158        chr1  21022229  21022729       476  EIF4G3  0.755471      0.02721467      0.524680     28263      1030
## ...           ...         ...       ...       ...       ...     ...       ...             ...           ...       ...       ...
## 1374830    155544       chr22  50524937  50525437     23714    TYMP -0.632002       0.0165565      0.496846      4557      1558
## 1374831    155577       chr22  50582163  50582663     23713    SCO2 -0.548795       0.0391178      0.561465     56559      1558
## 1374832    155577       chr22  50582163  50582663     23714    TYMP -0.560573       0.0348815      0.547340     52167      1558
## 1374833    155577       chr22  50582163  50582663     23715   ODF3B -0.549052       0.0390983      0.561465     50086      1558
## 1374834    157674        chrX  71532678  71533178     35821     OGT  0.694312       0.0457767      0.579826         0      1558
##                  tf        pval    stats        qval CD14..Mono.vs.rest.FDR CD14..Mono.vs.rest.p.value CD14..Mono.vs.rest.logFC
##         <character>    <matrix> <matrix>    <matrix>              <numeric>                  <numeric>                <numeric>
## 1              ADNP 2.14101e-07 26.90142 1.00000e+00           1.12227e-273               3.58197e-275                 0.453974
## 2              ADNP 2.09927e-03  9.46065 1.00000e+00            1.07642e-91                1.13527e-92                -0.381033
## 3              ADNP 3.18769e-21 89.42324 2.88367e-14           7.65618e-310               2.01501e-311                 0.575687
## 4              ADNP 4.50461e-03  8.06831 1.00000e+00            0.00000e+00                0.00000e+00                 0.579157
## 5              ADNP 8.62498e-05 15.41611 1.00000e+00            0.00000e+00                0.00000e+00                 1.127554
## ...             ...         ...      ...         ...                    ...                        ...                      ...
## 1374830        ZXDC 0.019343447  5.47020           1            0.00000e+00                0.00000e+00                 2.369751
## 1374831        ZXDC 0.030612624  4.67456           1            0.00000e+00                0.00000e+00                 0.703607
## 1374832        ZXDC 0.000448926 12.31669           1            0.00000e+00                0.00000e+00                 2.369751
## 1374833        ZXDC 0.006005115  7.54877           1           2.70505e-207               1.26648e-208                 0.355839
## 1374834        ZXDC 0.010115322  6.61447           1           1.37044e-267               4.55459e-269                 0.443879
##         Memory.CD4..T.vs.rest.FDR Memory.CD4..T.vs.rest.p.value Memory.CD4..T.vs.rest.logFC B.vs.rest.FDR B.vs.rest.p.value
##                         <numeric>                     <numeric>                   <numeric>     <numeric>         <numeric>
## 1                    2.05315e-154                  6.44038e-156                   -0.375422   1.86039e-71       7.26021e-73
## 2                     4.84933e-83                   3.37369e-84                   -0.366194   2.38300e-91       7.00421e-93
## 3                     5.24986e-87                   3.43190e-88                   -0.374737  2.15099e-133      3.81343e-135
## 4                    2.68684e-150                  8.80424e-152                   -0.431623  4.46147e-126      8.60754e-128
## 5                    1.58185e-203                  3.28630e-205                   -0.769835  9.85829e-110      2.29967e-111
## ...                           ...                           ...                         ...           ...               ...
## 1374830               0.00000e+00                   0.00000e+00                   -2.246853   0.00000e+00       0.00000e+00
## 1374831               0.00000e+00                   0.00000e+00                   -0.669777  1.26531e-315      4.89624e-318
## 1374832               0.00000e+00                   0.00000e+00                   -2.246853   0.00000e+00       0.00000e+00
## 1374833              7.68600e-155                  2.39621e-156                   -0.323648  2.16867e-129      4.01738e-131
## 1374834              5.86017e-169                  1.62112e-170                    0.553519   3.41241e-56       1.79433e-57
##         B.vs.rest.logFC Monocytes.vs.rest.FDR Monocytes.vs.rest.p.value Monocytes.vs.rest.logFC Naive.CD8..T.vs.rest.FDR
##               <numeric>             <numeric>                 <numeric>               <numeric>                <numeric>
## 1             -0.336361           1.61672e-15               1.49878e-16                0.310028             1.69311e-155
## 2             -0.381502           7.44419e-91               4.10638e-93               -0.381965              1.88762e-92
## 3             -0.484800           1.52032e-18               1.15574e-19                0.396732             2.67629e-197
## 4             -0.461085           1.51226e-45               3.07532e-47                0.961398             4.65608e-199
## 5              0.950493           6.72774e-52               1.06534e-53                1.283433             1.15679e-224
## ...                 ...                   ...                       ...                     ...                      ...
## 1374830       -2.137413           5.62567e-83               3.56642e-85                1.673328              0.00000e+00
## 1374831       -0.691577           5.05831e-18               3.97441e-19                0.434858              0.00000e+00
## 1374832       -2.137413           5.62567e-83               3.56642e-85                1.673328              0.00000e+00
## 1374833       -0.324067           2.75961e-13               3.00060e-14                0.252888             2.29027e-184
## 1374834        0.407450           6.94934e-24               3.81243e-25                0.458222             3.58531e-155
##         Naive.CD8..T.vs.rest.p.value Naive.CD8..T.vs.rest.logFC FCGR3A..Mono.vs.rest.FDR FCGR3A..Mono.vs.rest.p.value
##                            <numeric>                  <numeric>                <numeric>                    <numeric>
## 1                       4.84635e-157                  -0.381930              1.28738e-75                  3.94290e-77
## 2                        1.02675e-93                   0.381965              1.85410e-91                  4.10631e-93
## 3                       5.40575e-199                  -0.502247              4.51660e-48                  2.62284e-49
## 4                       9.32800e-201                  -0.488863             3.23963e-108                 5.18334e-110
## 5                       1.98101e-226                  -0.822841             2.28494e-101                 4.15125e-103
## ...                              ...                        ...                      ...                          ...
## 1374830                  0.00000e+00                  -2.369751             4.02540e-319                 4.74303e-322
## 1374831                  0.00000e+00                  -0.697245              1.10843e-54                  5.33866e-56
## 1374832                  0.00000e+00                  -2.369751             4.02540e-319                 4.74303e-322
## 1374833                 5.13517e-186                  -0.343186              2.39081e-31                  2.25381e-32
## 1374834                 1.03118e-156                   0.600549              4.86603e-46                  2.96732e-47
##         FCGR3A..Mono.vs.rest.logFC DC.vs.rest.FDR DC.vs.rest.p.value DC.vs.rest.logFC Naive.CD4..T.vs.rest.FDR
##                          <numeric>      <numeric>          <numeric>        <numeric>                <numeric>
## 1                         0.613397    1.48721e-10        1.38321e-11         0.467425             4.93457e-177
## 2                        -0.381965    5.36314e-91        4.10647e-93        -0.381965              1.74410e-35
## 3                         0.464689    1.15367e-05        1.85408e-06         0.325063             2.32559e-185
## 4                         0.912985    6.92276e-55        1.00693e-56        -1.224596             5.94247e-203
## 5                        -0.752391    3.09698e-24        1.18820e-25         1.224873             2.27925e-259
## ...                            ...            ...                ...              ...                      ...
## 1374830                   2.407016    9.94254e-52        1.54440e-53        -2.022347              0.00000e+00
## 1374831                   0.529519    3.60704e-33        9.35467e-35        -0.635970              0.00000e+00
## 1374832                   2.407016    9.94254e-52        1.54440e-53        -2.022347              0.00000e+00
## 1374833                   0.272149    9.20085e-16        5.72181e-17        -0.281375             9.44166e-179
## 1374834                   0.405742    1.55880e-26        5.31323e-28         1.288327             3.85455e-118
##         Naive.CD4..T.vs.rest.p.value Naive.CD4..T.vs.rest.logFC Memory.CD8..T.vs.rest.FDR Memory.CD8..T.vs.rest.p.value
##                            <numeric>                  <numeric>                 <numeric>                     <numeric>
## 1                       9.94010e-179                  -0.402145               4.83824e-93                   1.51369e-94
## 2                        2.12424e-36                  -0.265716               2.07278e-90                   6.76934e-92
## 3                       4.42934e-187                  -0.499501               5.15609e-58                   2.82016e-59
## 4                       1.00949e-204                  -0.498699               2.89359e-93                   9.00523e-95
## 5                       2.86486e-261                  -0.883351               3.34172e-94                   1.02531e-95
## ...                              ...                        ...                       ...                           ...
## 1374830                  0.00000e+00                  -2.346248               0.00000e+00                   0.00000e+00
## 1374831                  0.00000e+00                  -0.703607              1.78609e-274                  1.09308e-276
## 1374832                  0.00000e+00                  -2.346248               0.00000e+00                   0.00000e+00
## 1374833                 1.87859e-180                  -0.341559              1.03392e-165                  1.46415e-167
## 1374834                 1.36250e-119                   0.518676               6.48018e-81                   2.40619e-82
##         Memory.CD8..T.vs.rest.logFC NK.vs.rest.FDR NK.vs.rest.p.value NK.vs.rest.logFC NA.vs.rest.FDR NA.vs.rest.p.value
##                           <numeric>      <numeric>          <numeric>        <numeric>      <numeric>          <numeric>
## 1                         -0.358226    1.19826e-65        3.53184e-67        -0.355368   1.34418e-273       3.58197e-275
## 2                         -0.380283    4.31779e-89        8.81618e-91        -0.379363    4.29660e-92        4.10700e-93
## 3                         -0.379189    7.20212e-67        2.07735e-68        -0.450352   9.36515e-310       2.01501e-311
## 4                          0.678123   2.95876e-118       4.05187e-120         1.421655    0.00000e+00        0.00000e+00
## 5                          0.766010    6.52472e-70        1.79959e-71         0.967395    0.00000e+00        0.00000e+00
## ...                             ...            ...                ...              ...            ...                ...
## 1374830                   -2.315715   2.18585e-267       7.37856e-270        -2.218151    4.85186e-06        1.65084e-06
## 1374831                   -0.669459   9.48323e-192       5.85577e-194        -0.665203    3.69027e-01        1.89547e-01
## 1374832                   -2.315715   2.18585e-267       7.37856e-270        -2.218151    4.85186e-06        1.65084e-06
## 1374833                   -0.338003    1.38920e-89        2.80981e-91        -0.321178   3.12869e-207       1.26648e-208
## 1374834                    0.569832    2.21855e-40        1.17083e-41         0.527130   1.63992e-267       4.55459e-269
##         NA.vs.rest.logFC                                weight
##                <numeric>                              <matrix>
## 1              -0.453974 0.0583992: 0.00000000:-0.00307422:...
## 2              -0.381965 0.0311224:-0.00137646:-0.00249145:...
## 3              -0.575687 0.1032210: 0.01156464: 0.03634468:...
## 4              -0.579157 0.0333821: 0.00000000: 0.00137167:...
## 5              -1.127554 0.0550255:-0.00297161: 0.03414523:...
## ...                  ...                                   ...
## 1374830        -2.121123 0.0258094: 0.00679141:-0.00776715:...
## 1374831        -0.531398 0.0262938:-0.01485703: 0.00423260:...
## 1374832        -2.121123 0.0532873:-0.00953433: 0.00931223:...
## 1374833        -0.355839 0.0311489:-0.05049477: 0.00802140:...
## 1374834        -0.443879 0.0256298: 0.04065795: 0.01652375:...

6.1.1 Aggregate cells

As with the pruneRegulon function, cells can be aggregated into metacells using the aggregateCells argument in addWeights. Use this option only with the Wilcoxon method, since the other two methods already incorporate cell aggregation. By default, the function generates aggregates containing an average of 10 cells. However, the exact number may vary across metacells, depending on how single cells are distributed in the reduced dimensionality space. Refer to the cell aggregation section in Regulation Refinement for additional information.

regulon.w.aggr <- addWeights(regulon = pruned.regulon,
                             expMatrix  = GeneExpressionMatrix,
                             exp_assay  = "normalizedCounts",
                             peakMatrix = PeakMatrix,
                             peak_assay = "counts",
                             method = "wilcox",
                             clusters = GeneExpressionMatrix$cell_type,
                             aggregateCells = TRUE,
                             useDim = "LSI_RNA",
                             exp_cutoff=NULL,
                             peak_cutoff=NULL)

6.1.2 Using chromatin accessibility data

The corr and MI methods for calculating weights by default ignore information about chromatin accessibility. However, it is possible to incorporate this modality into the calculations by setting the tf_re.merge argument to TRUE. This adds an additional step to the procedure, in which transcription factor expression is multiplied by chromatin accessibility. Therefore, expression is reduced to zero if the corresponding element is not accessible. The TF-RE product is calculated at the level of aggregated cells, mitigating the effect of sparsity, which affects ATAC-seq data more than gene expression. See an example using the tf_re.merge argument.

6.2 Calculate TF activity

Finally, the activity for a specific TF in each cell is computed by averaging the expression of target genes weighted by the test statistics of choice as explained in the add weights section. \[y=\frac{1}{n}\sum_{i=1}^{n} x_i * weights_i\] where \(y\) is the activity of a TF for a cell, \(n\) is the total number of targets for a TF, \(x_i\) is the log count expression of target \(i\) where \(i\) in {1,2,…,n} and \(weights_i\) is the weight of TF - target \(i\)

score.combine <- calculateActivity(expMatrix = GeneExpressionMatrix,
                                   regulon = regulon.w, 
                                   mode = "weight", 
                                   exp_assay = "normalizedCounts")

score.combine[1:5,1:5]
## 5 x 5 sparse Matrix of class "dgCMatrix"
##       PBMC_10k#GGTTGCATCCTGGCTT-1 PBMC_10k#GGTTGCGGTAAACAAG-1 PBMC_10k#TGTTCCTCATAAGTTC-1 PBMC_10k#CGACTAAGTAACGGGA-1
## ADNP                   0.19163136                   0.1479988                   0.1897086                  0.11159201
## AEBP2                  0.07737805                   0.1577963                   0.1498887                  0.18522299
## AFF1                   0.16854888                   0.1645562                   0.2024166                  0.11041418
## AFF4                   0.20493204                   0.1909851                   0.2443615                  0.12750381
## AGO1                   0.15670300                   0.1384629                   0.1603721                  0.09831584
##       PBMC_10k#CTGCTCCCAAGGTCCT-1
## ADNP                    0.2122728
## AEBP2                   0.1830738
## AFF1                    0.2081108
## AFF4                    0.2504034
## AGO1                    0.1595813