clusterProfiler: Bioconductor, Paper, Documentation
enrichplot: Bioconductor
Demo Dataset: E-MTAB-8411 from The clock gene Bmal1 inhibits macrophage motility, phagocytosis, and impairs defense against pneumonia. PNAS. 2020;117(3):1543-1551.
License: GPL-3.0
R
cd /ngs/GO-Enrichment-Analysis-Demo
R
library("clusterProfiler")
library("enrichplot")
library("org.Mm.eg.db")
library("ggplot2")
If you have downloaded the DESeq2_DEG.txt
file with wget
:
data.table::fread("DESeq2_DEG.txt")
data <-$GeneID <- substr(data$GeneID, 1, 18) data
If you like to donwload the file in R
now:
data.table::fread("https://raw.githubusercontent.com/ycl6/GO-Enrichment-Analysis-Demo/master/DESeq2_DEG.txt")
data <-$GeneID <- substr(data$GeneID, 1, 18) data
data
## GeneID GeneSymbol log2fc pvalue padj
## 1: ENSMUSG00000000001 Gnai3 0.08804493 0.1925609 0.6146732
## 2: ENSMUSG00000000003 Pbsn NA NA NA
## 3: ENSMUSG00000000028 Cdc45 -0.32106635 0.1401127 0.5331437
## 4: ENSMUSG00000000031 H19 -1.20339889 0.7161464 NA
## 5: ENSMUSG00000000037 Scml2 -0.57746426 0.2979159 NA
## ---
## 55381: ENSMUSG00000118636 AC117663.3 NA NA NA
## 55382: ENSMUSG00000118637 AL772212.1 NA NA NA
## 55383: ENSMUSG00000118638 AL805980.1 NA NA NA
## 55384: ENSMUSG00000118639 AL590997.4 NA NA NA
## 55385: ENSMUSG00000118640 AC167036.2 0.06415390 0.9208414 NA
which(data$padj < 0.05 & data$log2fc > 0) # FDR < 0.05 and logFC > 0
up.idx <- which(data$padj < 0.05 & data$log2fc < 0) # FDR < 0.05 and logFC < 0 dn.idx <-
dim(data)
## [1] 55385 5
length(up.idx)
## [1] 383
length(dn.idx)
## [1] 429
data$GeneSymbol
all.genes <- data[up.idx,]$GeneSymbol
up.genes <- data[dn.idx,]$GeneSymbol dn.genes <-
head(up.genes, 10)
## [1] "Axin2" "Hnrnpd" "Kcnn3" "Mapk7" "Agpat3" "Sema6b" "Efnb2" "Il16" "Ltbp1"
## [10] "Rgs19"
head(dn.genes, 10)
## [1] "Cox5a" "Pdgfb" "Itga5" "Cd52" "Dnmt3l" "Tubb6" "Ell2" "Ifrd1" "Stk38l"
## [10] "Ubl3"
Alternatively, if you only have Ensembl gene ID
data$GeneID
all.genes <- data[up.idx,]$GeneID
up.genes <- data[dn.idx,]$GeneID dn.genes <-
"BP" ontology <-
paste0("clusterProfiler_GO-", ontology, "_ORA_simplify")
outTitle <- outTitle
## [1] "clusterProfiler_GO-BP_ORA_simplify"
We use all the annotated genes (all.genes
) as the “gene universe” and the differentially expressed genes (up.genes
and dn.genes
) as our genes of interest.
We would need to convert any other identifier format to ENTREZID
which is the required input identifier format. This can be done by using the select
function from AnnotationDbi
that we saw in Part 1 of this demo, or by using the “Biological Id TRanslator” bitr
function from clusterProfiler
which is a wrapper function of AnnotationDbi::select
.
Here, we will use bitr
here to see how this can be done.
# Use fromType = "ENSEMBL" if your input identifier is Ensembl gene ID
bitr(all.genes, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = "org.Mm.eg.db") all.genes.df =
## 'select()' returned 1:1 mapping between keys and columns
head(all.genes.df, 10)
## SYMBOL ENTREZID
## 1 Gnai3 14679
## 2 Pbsn 54192
## 3 Cdc45 12544
## 4 H19 14955
## 5 Scml2 107815
## 6 Apoh 11818
## 7 Narf 67608
## 8 Cav2 12390
## 9 Klf6 23849
## 10 Scmh1 29871
bitr(up.genes, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = "org.Mm.eg.db") up.genes.df =
## 'select()' returned 1:1 mapping between keys and columns
head(up.genes.df, 10)
## SYMBOL ENTREZID
## 1 Axin2 12006
## 2 Hnrnpd 11991
## 3 Kcnn3 140493
## 4 Mapk7 23939
## 5 Agpat3 28169
## 6 Sema6b 20359
## 7 Efnb2 13642
## 8 Il16 16170
## 9 Ltbp1 268977
## 10 Rgs19 56470
bitr(dn.genes, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = "org.Mm.eg.db") dn.genes.df =
## 'select()' returned 1:1 mapping between keys and columns
head(dn.genes.df, 10)
## SYMBOL ENTREZID
## 1 Cox5a 12858
## 2 Pdgfb 18591
## 3 Itga5 16402
## 4 Cd52 23833
## 5 Dnmt3l 54427
## 6 Tubb6 67951
## 7 Ell2 192657
## 8 Ifrd1 15982
## 9 Stk38l 232533
## 10 Ubl3 24109
pvalueCutoff
on unadjusted p-values, (2) pvalueCutoff
on adjusted p-values and (3) qvalueCutoff
on q-values. enrichGO(gene = up.genes.df$ENTREZID, universe = all.genes.df$ENTREZID,
upEGO =OrgDb = "org.Mm.eg.db", ont = ontology, pvalueCutoff = 0.05,
pAdjustMethod = "BH", qvalueCutoff = 0.05, readable = TRUE)
upEGO
## #
## # over-representation test
## #
## #...@organism Mus musculus
## #...@ontology BP
## #...@keytype ENTREZID
## #...@gene chr [1:370] "12006" "11991" "140493" "23939" "28169" "20359" "13642" "16170" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05
## #...80 enriched terms found
## 'data.frame': 80 obs. of 9 variables:
## $ ID : chr "GO:0050900" "GO:0060326" "GO:0050764" "GO:0046394" ...
## $ Description: chr "leukocyte migration" "cell chemotaxis" "regulation of phagocytosis" "carboxylic acid biosynthetic process" ...
## $ GeneRatio : chr "25/342" "18/342" "11/342" "17/342" ...
## $ BgRatio : chr "358/22036" "301/22036" "111/22036" "298/22036" ...
## $ pvalue : num 0.000000000445 0.000001256598 0.000001273094 0.000004674059 0.000004886976 ...
## $ p.adjust : num 0.00000168 0.0016007 0.0016007 0.00296886 0.00296886 ...
## $ qvalue : num 0.0000014 0.0013365 0.0013365 0.0024789 0.0024789 ...
## $ geneID : chr "Il16/Lbp/Mmp9/Lgmn/Cd200r1/Alox5/Itgb1/Selp/Itga4/Itga6/Bst1/Coro1a/Itgam/Cadm1/Vav3/Ripor2/Gcnt1/Itga9/Ptger4/"| __truncated__ "Il16/Lbp/Elmo2/Lgmn/Alox5/Bst1/Coro1a/Itgam/Fgfr1/Bcar1/Vav3/Dock4/Ripor2/Itga9/Mtus1/Nod2/Ano6/C5ar2" "Lbp/Alox15/Pot1b/Mfge8/Rab27a/Cd300lf/Nod2/Prtn3/Ano6/Sirpb1c/Gm5150" "Fcer1a/Tha1/Ptgis/Alox15/Acaca/Acly/Mtr/Alox5/Hacd4/Sdsl/Ilvbl/Ccnd3/Dse/Ptger4/Abcd2/Per2/Ldha" ...
## $ Count : int 25 18 11 17 17 21 16 14 14 14 ...
## #...Citation
## Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
## clusterProfiler: an R package for comparing biological themes among
## gene clusters. OMICS: A Journal of Integrative Biology
## 2012, 16(5):284-287
enrichGO(gene = dn.genes.df$ENTREZID, universe = all.genes.df$ENTREZID,
dnEGO =OrgDb = "org.Mm.eg.db", ont = ontology, pvalueCutoff = 0.05,
pAdjustMethod = "BH", qvalueCutoff = 0.05, readable = TRUE)
dnEGO
## #
## # over-representation test
## #
## #...@organism Mus musculus
## #...@ontology BP
## #...@keytype ENTREZID
## #...@gene chr [1:422] "12858" "18591" "16402" "23833" "54427" "67951" "192657" "15982" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05
## #...191 enriched terms found
## 'data.frame': 191 obs. of 9 variables:
## $ ID : chr "GO:0002224" "GO:0042116" "GO:0002221" "GO:0072593" ...
## $ Description: chr "toll-like receptor signaling pathway" "macrophage activation" "pattern recognition receptor signaling pathway" "reactive oxygen species metabolic process" ...
## $ GeneRatio : chr "15/389" "13/389" "15/389" "20/389" ...
## $ BgRatio : chr "108/22036" "98/22036" "149/22036" "285/22036" ...
## $ pvalue : num 0.000000000759 0.000000018349 0.000000065767 0.000000193465 0.00000027366 ...
## $ p.adjust : num 0.000003 0.0000363 0.0000867 0.0001913 0.0002065 ...
## $ qvalue : num 0.00000247 0.00002982 0.00007126 0.00015722 0.0001697 ...
## $ geneID : chr "Nr1h3/Cyba/Ptprs/Mapkapk2/Tnip1/Nr1d1/Traf3/Acod1/Tlr2/Lrrfip2/Tnip3/Tlr1/Ticam1/Rab7b/Irak2" "Nr1h3/Nr1d1/Gpr137b/Ifngr2/Trem2/Slc11a1/Tlr2/Slc7a2/Cebpa/Tlr1/Ticam1/Cst7/Syt11" "Nr1h3/Cyba/Ptprs/Mapkapk2/Tnip1/Nr1d1/Traf3/Acod1/Tlr2/Lrrfip2/Tnip3/Tlr1/Ticam1/Rab7b/Irak2" "Pdgfb/Cyba/Nfe2l2/Ncf1/Txnrd1/Acod1/Cdkn1a/Prdx6/Cat/Eif6/Tlr2/Prdx1/Sesn2/Slc25a33/Ndufc2/Slc7a2/Prex1/Shc1/Ticam1/Ass1" ...
## $ Count : int 15 13 15 20 24 27 17 22 14 22 ...
## #...Citation
## Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
## clusterProfiler: an R package for comparing biological themes among
## gene clusters. OMICS: A Journal of Integrative Biology
## 2012, 16(5):284-287
We then use the simplify
function to reduce redundancy of enriched GO terms. We will used the default parameters to run the function.
simplify(upEGO, cutoff = 0.7, by = "p.adjust", select_fun = min, measure = "Wang",
upSimGO =semData = NULL)
nrow(upSimGO)
## [1] 51
simplify(dnEGO, cutoff = 0.7, by = "p.adjust", select_fun = min, measure = "Wang",
dnSimGO =semData = NULL)
nrow(dnSimGO)
## [1] 103
We will use the barplot
function from enrichplot
(previously a function in clusterProfiler
)
png(paste0(outTitle, "_up.png"), width = 9, height = 6, units = "in", res = 300)
barplot(upSimGO, showCategory = 20) +
ggtitle(paste0("GO-", ontology," ORA of up-regulated genes")) +
xlab("Enriched terms") + ylab("Count")
invisible(dev.off())
png(paste0(outTitle, "_dn.png"), width = 9, height = 6, units = "in", res = 300)
barplot(dnSimGO, showCategory = 20) +
ggtitle(paste0("GO-", ontology," ORA of down-regulated genes")) +
xlab("Enriched terms") + ylab("Count")
invisible(dev.off())
Save enrichGO
results as a data.frame
upSimGO@result
up.tab = dnSimGO@result dn.tab =
write.table(up.tab, file = paste0(outTitle, "_up.txt"), sep = "\t", quote = F,
row.names = F, col.names = T)
write.table(dn.tab, file = paste0(outTitle, "_dn.txt"), sep = "\t", quote = F,
row.names = F, col.names = T)
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/r4/lib/libopenblasp-r0.3.12.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
## [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets methods
## [9] base
##
## other attached packages:
## [1] ggplot2_3.3.3 org.Mm.eg.db_3.12.0 AnnotationDbi_1.52.0
## [4] IRanges_2.24.1 S4Vectors_0.28.1 Biobase_2.50.0
## [7] BiocGenerics_0.36.0 enrichplot_1.10.2 clusterProfiler_3.18.1
## [10] knitr_1.31
##
## loaded via a namespace (and not attached):
## [1] bit64_4.0.5 RColorBrewer_1.1-2 tools_4.0.3 bslib_0.2.4
## [5] utf8_1.1.4 R6_2.5.0 DBI_1.1.1 colorspace_2.0-0
## [9] withr_2.4.1 tidyselect_1.1.0 gridExtra_2.3 bit_4.0.4
## [13] compiler_4.0.3 scatterpie_0.1.5 labeling_0.4.2 shadowtext_0.0.7
## [17] sass_0.3.1 scales_1.1.1 stringr_1.4.0 digest_0.6.27
## [21] rmarkdown_2.7 DOSE_3.16.0 pkgconfig_2.0.3 htmltools_0.5.1.1
## [25] highr_0.8 fastmap_1.1.0 rlang_0.4.10 RSQLite_2.2.3
## [29] jquerylib_0.1.3 farver_2.1.0 generics_0.1.0 jsonlite_1.7.2
## [33] BiocParallel_1.24.1 GOSemSim_2.16.1 dplyr_1.0.4 magrittr_2.0.1
## [37] GO.db_3.12.1 Matrix_1.3-2 Rcpp_1.0.6 munsell_0.5.0
## [41] fansi_0.4.2 viridis_0.5.1 lifecycle_1.0.0 stringi_1.5.3
## [45] yaml_2.2.1 ggraph_2.0.5 MASS_7.3-53.1 plyr_1.8.6
## [49] qvalue_2.22.0 grid_4.0.3 blob_1.2.1 ggrepel_0.9.1
## [53] DO.db_2.9 crayon_1.4.1 lattice_0.20-41 graphlayouts_0.7.1
## [57] cowplot_1.1.1 splines_4.0.3 pillar_1.5.0 fgsea_1.16.0
## [61] igraph_1.2.6 reshape2_1.4.4 fastmatch_1.1-0 glue_1.4.2
## [65] evaluate_0.14 downloader_0.4 data.table_1.14.0 BiocManager_1.30.10
## [69] png_0.1-7 vctrs_0.3.6 tweenr_1.0.1 gtable_0.3.0
## [73] purrr_0.3.4 polyclip_1.10-0 tidyr_1.1.3 assertthat_0.2.1
## [77] cachem_1.0.4 xfun_0.21 ggforce_0.3.2 tidygraph_1.2.0
## [81] viridisLite_0.3.0 tibble_3.1.0 rvcheck_0.1.8 memoise_2.0.0
## [85] ellipsis_0.3.1