5 Single modality: scRNA-seq only
Epiregulon also supports transcription factor activity inference when users only have scRNA-seq. After all, multiome or scATAC-seq data is still relatively rare. To enable TF activity inference on scRNA-seq, users can supply a pre-constructed gene regulatory network. Dorothea provides both human and mouse pre-constructed gene regulatory networks based on curated experimental and computational data. In this vignette, we bypass the regulon construction step and go straight to calculate TF activity from a Dorothea GRN.
5.1 Load regulon
Dorothea assigns confidence level to its regulons with A being the most confident (i.e. supported by multiple lines of evidence) and E being the least confident.
5.2 Load scRNA-seq data
We download the raw counts of a mouse pancreas data set from scRNAseq. We add normalized logcounts, perform dimension reduction and visualize the embeddings using scater.
5.3 Calculate activity
Even though Dorothea provides weights under the mor column, we can achieve superior performance if we recompute the weights based on the correlation between tf and target gene expression based on our own data. We performed 2 steps, the first step is to add weights to the Dorothea regulons and the second step is to estimate the TF activity by taking the weighted average of the target gene expression.
library(epiregulon)
#Add weights to regulon. Default method (wilcoxon) cannot be used
regulon.ms <- addWeights(regulon = regulon,
expMatrix = sce,
clusters = sce$label,
BPPARAM = BiocParallel::MulticoreParam(),
method="corr")
#Calculate activity
score.combine <- calculateActivity(sce,
regulon = regulon.ms,
mode = "weight",
method = "weightedMean")
5.4 Perform differential activity
library(epiregulon.extra)
markers <- findDifferentialActivity(activity_matrix = score.combine,
clusters = sce$label,
pval.type = "some",
direction = "up",
test.type = "t")
Take the top TFs
5.5 Visualize activity
Finally we visualize the TF activity by either UMAP, violin plots or bubble plots. We confirm the activity of known lineage factors Pdx1 and Neurod1 in beta cells, Arx in alpha cells and Foxa1 in ductal cells.
# plot umap
plotActivityDim(sce = sce,
activity_matrix = score.combine,
tf = genes_to_plot,
legend.label = "score",
point_size = 0.1,
dimtype = "UMAP",
label = "label",
combine = TRUE,
text_size = 2)
Plot bubble plot of differential TFs
We can adapt the epiregulon package to plot gene expression. When compared against TF activity, gene expression of Foxa1 and Arx has noisy signals and high dropout rates. Epiregulon enhances the signal to noise ratio of TF activity and better resolves lineage differences.
# plot umap
plotActivityDim(sce = sce,
activity_matrix = logcounts(sce),
tf = genes_to_plot,
legend.label = "gex",
point_size = 0.1,
dimtype = "UMAP",
label = "label",
combine = TRUE,
text_size = 2,
colors = c("gray","blue"),
limit = c(0,2))
# plot violin plot
plotActivityViolin(logcounts(sce),
tf = genes_to_plot,
clusters = sce$label,
legend.label = "gex")
# plot Bubble plot
plotBubble(logcounts(sce),
tf = markers.sig$tf,
clusters = sce$label,
legend.label = "gex")
We can visualize the target genes for transcription factors of interest
5.6 Pathway enrichment
Sometimes it is useful to understand what pathways are enriched in the regulons. We take the highly correlated target genes of a regulon and perform geneset enrichment using the enricher function from clusterProfiler.
#retrieve genesets
H <- EnrichmentBrowser::getGenesets(org = "mmu",
db = "msigdb",
cat = "H",
gene.id.type = "SYMBOL",
cache = FALSE)
C6 <- EnrichmentBrowser::getGenesets(org = "mmu",
db = "msigdb",
cat = "C6",
gene.id.type = "SYMBOL",
cache = FALSE)
#combine genesets and convert genesets to be compatible with enricher
gs <- c(H,C6)
gs.list <- do.call(rbind,lapply(names(gs), function(x)
{data.frame(gs = x, genes = gs[[x]])}))
enrichresults <- regulonEnrich(genes_to_plot,
regulon = regulon.ms,
weight = "weight",
weight_cutoff = 0.5,
genesets = gs.list)
## Foxa1
## Neurod1
## Pdx1
## Arx
5.7 Session Info
## R Under development (unstable) (2023-12-04 r85659)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 18.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so; LAPACK version 3.11.0
##
## Random number generation:
## RNG: L'Ecuyer-CMRG
## Normal: Inversion
## Sample: Rejection
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] parallel stats4 grid stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] org.Mm.eg.db_3.18.0 scater_1.31.2 scuttle_1.13.1
## [4] scRNAseq_2.17.5 dorothea_1.15.1 org.Hs.eg.db_3.18.0
## [7] AnnotationDbi_1.65.2 nabor_0.5.0 epiregulon.archr_0.99.2
## [10] igraph_2.0.3 msigdbr_7.5.1 epiregulon.extra_0.99.5
## [13] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.71.2 rtracklayer_1.63.1
## [16] BiocIO_1.13.0 Biostrings_2.71.4 XVector_0.43.1
## [19] epiregulon_0.99.5 scMultiome_1.3.0 SingleCellExperiment_1.25.0
## [22] MultiAssayExperiment_1.29.1 ExperimentHub_2.11.1 AnnotationHub_3.11.1
## [25] BiocFileCache_2.11.1 dbplyr_2.4.0 BiocStyle_2.31.0
## [28] rhdf5_2.47.6 SummarizedExperiment_1.33.3 Biobase_2.63.0
## [31] RcppArmadillo_0.12.8.1.0 Rcpp_1.0.12 Matrix_1.6-4
## [34] GenomicRanges_1.55.3 GenomeInfoDb_1.39.9 IRanges_2.37.1
## [37] S4Vectors_0.41.4 BiocGenerics_0.49.1 sparseMatrixStats_1.15.0
## [40] MatrixGenerics_1.15.0 matrixStats_1.2.0 data.table_1.15.2
## [43] stringr_1.5.1 plyr_1.8.9 magrittr_2.0.3
## [46] ggplot2_3.5.0 gtable_0.3.4 gtools_3.9.5
## [49] gridExtra_2.3 devtools_2.4.5 usethis_2.2.3
## [52] ArchR_1.0.3
##
## loaded via a namespace (and not attached):
## [1] R.methodsS3_1.8.2 GSEABase_1.65.1 progress_1.2.3 urlchecker_1.0.1
## [5] poweRlaw_0.80.0 HDF5Array_1.31.6 vctrs_0.6.5 digest_0.6.35
## [9] png_0.1-8 shape_1.4.6.1 aws.signature_0.6.0 gypsum_0.99.15
## [13] ggrepel_0.9.5 alabaster.sce_1.3.3 bcellViper_1.39.0 magick_2.8.3
## [17] MASS_7.3-60.1 reshape2_1.4.4 httpuv_1.6.14 foreach_1.5.2
## [21] qvalue_2.35.0 withr_3.0.0 ggrastr_1.0.2 aws.s3_0.3.21
## [25] xfun_0.42 ggfun_0.1.4 ellipsis_0.3.2 memoise_2.0.1
## [29] ggbeeswarm_0.7.2 clusterProfiler_4.11.0 gson_0.1.0 profvis_0.3.8
## [33] tidytree_0.4.6 GlobalOptions_0.1.2 KEGGgraph_1.63.0 R.oo_1.26.0
## [37] prettyunits_1.2.0 KEGGREST_1.43.0 promises_1.2.1 httr_1.4.7
## [41] aws.ec2metadata_0.2.0 restfulr_0.0.15 rhdf5filters_1.15.4 rstudioapi_0.15.0
## [45] miniUI_0.1.1.1 generics_0.1.3 DOSE_3.29.2 base64enc_0.1-3
## [49] babelgene_22.9 curl_5.2.1 zlibbioc_1.49.3 ScaledMatrix_1.11.1
## [53] ggraph_2.2.1 polyclip_1.10-6 GenomeInfoDbData_1.2.11 SparseArray_1.3.4
## [57] xtable_1.8-4 pracma_2.4.4 doParallel_1.0.17 evaluate_0.23
## [61] S4Arrays_1.3.6 hms_1.1.3 bookdown_0.38 irlba_2.3.5.1
## [65] colorspace_2.1-0 filelock_1.0.3 readr_2.1.5 Rgraphviz_2.47.0
## [69] later_1.3.2 viridis_0.6.5 ggtree_3.11.1 lattice_0.22-5
## [73] XML_3.99-0.16.1 shadowtext_0.1.3 cowplot_1.1.3 pillar_1.9.0
## [77] nlme_3.1-164 iterators_1.0.14 caTools_1.18.2 compiler_4.4.0
## [81] beachmat_2.19.1 stringi_1.8.3 GenomicAlignments_1.39.4 crayon_1.5.2
## [85] abind_1.4-5 gridGraphics_0.5-1 locfit_1.5-9.9 graphlayouts_1.1.1
## [89] bit_4.0.5 dplyr_1.1.4 fastmatch_1.1-4 codetools_0.2-19
## [93] BiocSingular_1.19.0 bslib_0.6.1 alabaster.ranges_1.3.3 GetoptLong_1.0.5
## [97] mime_0.12 splines_4.4.0 circlize_0.4.16 HDO.db_0.99.1
## [101] EnrichmentBrowser_2.33.1 knitr_1.45 blob_1.2.4 utf8_1.2.4
## [105] clue_0.3-65 BiocVersion_3.19.1 seqLogo_1.69.0 AnnotationFilter_1.27.0
## [109] fs_1.6.3 checkmate_2.3.1 DelayedMatrixStats_1.25.1 pkgbuild_1.4.4
## [113] ggplotify_0.1.2 tibble_3.2.1 statmod_1.5.0 tzdb_0.4.0
## [117] tweenr_2.0.3 pkgconfig_2.0.3 tools_4.4.0 cachem_1.0.8
## [121] RSQLite_2.3.5 viridisLite_0.4.2 DBI_1.2.2 fastmap_1.1.1
## [125] rmarkdown_2.26 scales_1.3.0 Rsamtools_2.19.3 sass_0.4.9
## [129] FNN_1.1.4 patchwork_1.2.0 BiocManager_1.30.22 graph_1.81.0
## [133] alabaster.schemas_1.3.1 farver_2.1.1 tidygraph_1.3.1 scatterpie_0.2.1
## [137] yaml_2.3.8 cli_3.6.2 purrr_1.0.2 motifmatchr_1.25.0
## [141] lifecycle_1.0.4 uwot_0.1.16 bluster_1.13.0 sessioninfo_1.2.2
## [145] backports_1.4.1 BiocParallel_1.37.1 annotate_1.81.2 rjson_0.2.21
## [149] ape_5.7-1 limma_3.59.6 jsonlite_1.8.8 edgeR_4.1.18
## [153] TFBSTools_1.41.0 bitops_1.0-7 bit64_4.0.5 yulab.utils_0.1.4
## [157] alabaster.matrix_1.3.13 BiocNeighbors_1.21.2 CNEr_1.39.0 alabaster.se_1.3.4
## [161] jquerylib_0.1.4 highr_0.10 metapod_1.11.1 GOSemSim_2.29.1
## [165] dqrng_0.3.2 R.utils_2.12.3 alabaster.base_1.3.23 lazyeval_0.2.2
## [169] shiny_1.8.0 htmltools_0.5.7 enrichplot_1.23.1 GO.db_3.18.0
## [173] rappdirs_0.3.3 ensembldb_2.27.1 glue_1.7.0 TFMPvalue_0.0.9
## [177] httr2_1.0.0 RCurl_1.98-1.14 treeio_1.27.0 scran_1.31.3
## [181] R6_2.5.1 tidyr_1.3.1 labeling_0.4.3 GenomicFeatures_1.55.4
## [185] cluster_2.1.6 pkgload_1.3.4 Rhdf5lib_1.25.1 aplot_0.2.2
## [189] DirichletMultinomial_1.45.0 ProtGenerics_1.35.4 DelayedArray_0.29.9 tidyselect_1.2.1
## [193] vipor_0.4.7 xml2_1.3.6 ggforce_0.4.2 rsvd_1.0.5
## [197] munsell_0.5.0 htmlwidgets_1.6.4 fgsea_1.29.0 ComplexHeatmap_2.19.0
## [201] RColorBrewer_1.1-3 biomaRt_2.59.1 rlang_1.1.3 remotes_2.5.0
## [205] fansi_1.0.6 Cairo_1.6-2 beeswarm_0.4.0