clusterProfiler: Bioconductor, Paper, Documentation
enrichplot: Bioconductor
Demo Dataset: E-MTAB-8411 from The clock gene Bmal1 inhibits macrophage motility, phagocytosis, and impairs defense against pneumonia. PNAS. 2020;117(3):1543-1551.
License: GPL-3.0
Rcd /ngs/GO-Enrichment-Analysis-Demo
R
library("clusterProfiler")
library("enrichplot")
library("org.Mm.eg.db")
library("ggplot2")If you have downloaded the DESeq2_DEG.txt file with wget:
data <- data.table::fread("DESeq2_DEG.txt")
data$GeneID <- substr(data$GeneID, 1, 18)If you like to donwload the file in R now:
data <- data.table::fread("https://raw.githubusercontent.com/ycl6/GO-Enrichment-Analysis-Demo/master/DESeq2_DEG.txt")
data$GeneID <- substr(data$GeneID, 1, 18)data## GeneID GeneSymbol log2fc pvalue padj
## 1: ENSMUSG00000000001 Gnai3 0.08804493 0.1925609 0.6146732
## 2: ENSMUSG00000000003 Pbsn NA NA NA
## 3: ENSMUSG00000000028 Cdc45 -0.32106635 0.1401127 0.5331437
## 4: ENSMUSG00000000031 H19 -1.20339889 0.7161464 NA
## 5: ENSMUSG00000000037 Scml2 -0.57746426 0.2979159 NA
## ---
## 55381: ENSMUSG00000118636 AC117663.3 NA NA NA
## 55382: ENSMUSG00000118637 AL772212.1 NA NA NA
## 55383: ENSMUSG00000118638 AL805980.1 NA NA NA
## 55384: ENSMUSG00000118639 AL590997.4 NA NA NA
## 55385: ENSMUSG00000118640 AC167036.2 0.06415390 0.9208414 NA
up.idx <- which(data$padj < 0.05 & data$log2fc > 0) # FDR < 0.05 and logFC > 0
dn.idx <- which(data$padj < 0.05 & data$log2fc < 0) # FDR < 0.05 and logFC < 0dim(data)## [1] 55385 5
length(up.idx)## [1] 383
length(dn.idx)## [1] 429
all.genes <- data$GeneSymbol
up.genes <- data[up.idx,]$GeneSymbol
dn.genes <- data[dn.idx,]$GeneSymbolhead(up.genes, 10)## [1] "Axin2" "Hnrnpd" "Kcnn3" "Mapk7" "Agpat3" "Sema6b" "Efnb2" "Il16" "Ltbp1"
## [10] "Rgs19"
head(dn.genes, 10)## [1] "Cox5a" "Pdgfb" "Itga5" "Cd52" "Dnmt3l" "Tubb6" "Ell2" "Ifrd1" "Stk38l"
## [10] "Ubl3"
Alternatively, if you only have Ensembl gene ID
all.genes <- data$GeneID
up.genes <- data[up.idx,]$GeneID
dn.genes <- data[dn.idx,]$GeneIDontology <- "BP"outTitle <- paste0("clusterProfiler_GO-", ontology, "_ORA_simplify")
outTitle## [1] "clusterProfiler_GO-BP_ORA_simplify"
We use all the annotated genes (all.genes) as the “gene universe” and the differentially expressed genes (up.genes and dn.genes) as our genes of interest.
We would need to convert any other identifier format to ENTREZID which is the required input identifier format. This can be done by using the select function from AnnotationDbi that we saw in Part 1 of this demo, or by using the “Biological Id TRanslator” bitr function from clusterProfiler which is a wrapper function of AnnotationDbi::select.
Here, we will use bitr here to see how this can be done.
# Use fromType = "ENSEMBL" if your input identifier is Ensembl gene ID
all.genes.df = bitr(all.genes, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = "org.Mm.eg.db")## 'select()' returned 1:1 mapping between keys and columns
head(all.genes.df, 10)## SYMBOL ENTREZID
## 1 Gnai3 14679
## 2 Pbsn 54192
## 3 Cdc45 12544
## 4 H19 14955
## 5 Scml2 107815
## 6 Apoh 11818
## 7 Narf 67608
## 8 Cav2 12390
## 9 Klf6 23849
## 10 Scmh1 29871
up.genes.df = bitr(up.genes, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = "org.Mm.eg.db")## 'select()' returned 1:1 mapping between keys and columns
head(up.genes.df, 10)## SYMBOL ENTREZID
## 1 Axin2 12006
## 2 Hnrnpd 11991
## 3 Kcnn3 140493
## 4 Mapk7 23939
## 5 Agpat3 28169
## 6 Sema6b 20359
## 7 Efnb2 13642
## 8 Il16 16170
## 9 Ltbp1 268977
## 10 Rgs19 56470
dn.genes.df = bitr(dn.genes, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = "org.Mm.eg.db")## 'select()' returned 1:1 mapping between keys and columns
head(dn.genes.df, 10)## SYMBOL ENTREZID
## 1 Cox5a 12858
## 2 Pdgfb 18591
## 3 Itga5 16402
## 4 Cd52 23833
## 5 Dnmt3l 54427
## 6 Tubb6 67951
## 7 Ell2 192657
## 8 Ifrd1 15982
## 9 Stk38l 232533
## 10 Ubl3 24109
pvalueCutoff on unadjusted p-values, (2) pvalueCutoff on adjusted p-values and (3) qvalueCutoff on q-values.upEGO = enrichGO(gene = up.genes.df$ENTREZID, universe = all.genes.df$ENTREZID,
OrgDb = "org.Mm.eg.db", ont = ontology, pvalueCutoff = 0.05,
pAdjustMethod = "BH", qvalueCutoff = 0.05, readable = TRUE)
upEGO## #
## # over-representation test
## #
## #...@organism Mus musculus
## #...@ontology BP
## #...@keytype ENTREZID
## #...@gene chr [1:370] "12006" "11991" "140493" "23939" "28169" "20359" "13642" "16170" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05
## #...80 enriched terms found
## 'data.frame': 80 obs. of 9 variables:
## $ ID : chr "GO:0050900" "GO:0060326" "GO:0050764" "GO:0046394" ...
## $ Description: chr "leukocyte migration" "cell chemotaxis" "regulation of phagocytosis" "carboxylic acid biosynthetic process" ...
## $ GeneRatio : chr "25/342" "18/342" "11/342" "17/342" ...
## $ BgRatio : chr "358/22036" "301/22036" "111/22036" "298/22036" ...
## $ pvalue : num 0.000000000445 0.000001256598 0.000001273094 0.000004674059 0.000004886976 ...
## $ p.adjust : num 0.00000168 0.0016007 0.0016007 0.00296886 0.00296886 ...
## $ qvalue : num 0.0000014 0.0013365 0.0013365 0.0024789 0.0024789 ...
## $ geneID : chr "Il16/Lbp/Mmp9/Lgmn/Cd200r1/Alox5/Itgb1/Selp/Itga4/Itga6/Bst1/Coro1a/Itgam/Cadm1/Vav3/Ripor2/Gcnt1/Itga9/Ptger4/"| __truncated__ "Il16/Lbp/Elmo2/Lgmn/Alox5/Bst1/Coro1a/Itgam/Fgfr1/Bcar1/Vav3/Dock4/Ripor2/Itga9/Mtus1/Nod2/Ano6/C5ar2" "Lbp/Alox15/Pot1b/Mfge8/Rab27a/Cd300lf/Nod2/Prtn3/Ano6/Sirpb1c/Gm5150" "Fcer1a/Tha1/Ptgis/Alox15/Acaca/Acly/Mtr/Alox5/Hacd4/Sdsl/Ilvbl/Ccnd3/Dse/Ptger4/Abcd2/Per2/Ldha" ...
## $ Count : int 25 18 11 17 17 21 16 14 14 14 ...
## #...Citation
## Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
## clusterProfiler: an R package for comparing biological themes among
## gene clusters. OMICS: A Journal of Integrative Biology
## 2012, 16(5):284-287
dnEGO = enrichGO(gene = dn.genes.df$ENTREZID, universe = all.genes.df$ENTREZID,
OrgDb = "org.Mm.eg.db", ont = ontology, pvalueCutoff = 0.05,
pAdjustMethod = "BH", qvalueCutoff = 0.05, readable = TRUE)
dnEGO## #
## # over-representation test
## #
## #...@organism Mus musculus
## #...@ontology BP
## #...@keytype ENTREZID
## #...@gene chr [1:422] "12858" "18591" "16402" "23833" "54427" "67951" "192657" "15982" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05
## #...191 enriched terms found
## 'data.frame': 191 obs. of 9 variables:
## $ ID : chr "GO:0002224" "GO:0042116" "GO:0002221" "GO:0072593" ...
## $ Description: chr "toll-like receptor signaling pathway" "macrophage activation" "pattern recognition receptor signaling pathway" "reactive oxygen species metabolic process" ...
## $ GeneRatio : chr "15/389" "13/389" "15/389" "20/389" ...
## $ BgRatio : chr "108/22036" "98/22036" "149/22036" "285/22036" ...
## $ pvalue : num 0.000000000759 0.000000018349 0.000000065767 0.000000193465 0.00000027366 ...
## $ p.adjust : num 0.000003 0.0000363 0.0000867 0.0001913 0.0002065 ...
## $ qvalue : num 0.00000247 0.00002982 0.00007126 0.00015722 0.0001697 ...
## $ geneID : chr "Nr1h3/Cyba/Ptprs/Mapkapk2/Tnip1/Nr1d1/Traf3/Acod1/Tlr2/Lrrfip2/Tnip3/Tlr1/Ticam1/Rab7b/Irak2" "Nr1h3/Nr1d1/Gpr137b/Ifngr2/Trem2/Slc11a1/Tlr2/Slc7a2/Cebpa/Tlr1/Ticam1/Cst7/Syt11" "Nr1h3/Cyba/Ptprs/Mapkapk2/Tnip1/Nr1d1/Traf3/Acod1/Tlr2/Lrrfip2/Tnip3/Tlr1/Ticam1/Rab7b/Irak2" "Pdgfb/Cyba/Nfe2l2/Ncf1/Txnrd1/Acod1/Cdkn1a/Prdx6/Cat/Eif6/Tlr2/Prdx1/Sesn2/Slc25a33/Ndufc2/Slc7a2/Prex1/Shc1/Ticam1/Ass1" ...
## $ Count : int 15 13 15 20 24 27 17 22 14 22 ...
## #...Citation
## Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
## clusterProfiler: an R package for comparing biological themes among
## gene clusters. OMICS: A Journal of Integrative Biology
## 2012, 16(5):284-287
We then use the simplify function to reduce redundancy of enriched GO terms. We will used the default parameters to run the function.
upSimGO = simplify(upEGO, cutoff = 0.7, by = "p.adjust", select_fun = min, measure = "Wang",
semData = NULL)
nrow(upSimGO)## [1] 51
dnSimGO = simplify(dnEGO, cutoff = 0.7, by = "p.adjust", select_fun = min, measure = "Wang",
semData = NULL)
nrow(dnSimGO)## [1] 103
We will use the barplot function from enrichplot (previously a function in clusterProfiler)
png(paste0(outTitle, "_up.png"), width = 9, height = 6, units = "in", res = 300)
barplot(upSimGO, showCategory = 20) +
ggtitle(paste0("GO-", ontology," ORA of up-regulated genes")) +
xlab("Enriched terms") + ylab("Count")
invisible(dev.off())“clusterProfiler_GO-BP_ORA_simplify_up.png”
png(paste0(outTitle, "_dn.png"), width = 9, height = 6, units = "in", res = 300)
barplot(dnSimGO, showCategory = 20) +
ggtitle(paste0("GO-", ontology," ORA of down-regulated genes")) +
xlab("Enriched terms") + ylab("Count")
invisible(dev.off())“clusterProfiler_GO-BP_ORA_simplify_dn.png”
Save enrichGO results as a data.frame
up.tab = upSimGO@result
dn.tab = dnSimGO@resultwrite.table(up.tab, file = paste0(outTitle, "_up.txt"), sep = "\t", quote = F,
row.names = F, col.names = T)
write.table(dn.tab, file = paste0(outTitle, "_dn.txt"), sep = "\t", quote = F,
row.names = F, col.names = T)sessionInfo()## R version 4.0.3 (2020-10-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/r4/lib/libopenblasp-r0.3.12.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
## [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets methods
## [9] base
##
## other attached packages:
## [1] ggplot2_3.3.3 org.Mm.eg.db_3.12.0 AnnotationDbi_1.52.0
## [4] IRanges_2.24.1 S4Vectors_0.28.1 Biobase_2.50.0
## [7] BiocGenerics_0.36.0 enrichplot_1.10.2 clusterProfiler_3.18.1
## [10] knitr_1.31
##
## loaded via a namespace (and not attached):
## [1] bit64_4.0.5 RColorBrewer_1.1-2 tools_4.0.3 bslib_0.2.4
## [5] utf8_1.1.4 R6_2.5.0 DBI_1.1.1 colorspace_2.0-0
## [9] withr_2.4.1 tidyselect_1.1.0 gridExtra_2.3 bit_4.0.4
## [13] compiler_4.0.3 scatterpie_0.1.5 labeling_0.4.2 shadowtext_0.0.7
## [17] sass_0.3.1 scales_1.1.1 stringr_1.4.0 digest_0.6.27
## [21] rmarkdown_2.7 DOSE_3.16.0 pkgconfig_2.0.3 htmltools_0.5.1.1
## [25] highr_0.8 fastmap_1.1.0 rlang_0.4.10 RSQLite_2.2.3
## [29] jquerylib_0.1.3 farver_2.1.0 generics_0.1.0 jsonlite_1.7.2
## [33] BiocParallel_1.24.1 GOSemSim_2.16.1 dplyr_1.0.4 magrittr_2.0.1
## [37] GO.db_3.12.1 Matrix_1.3-2 Rcpp_1.0.6 munsell_0.5.0
## [41] fansi_0.4.2 viridis_0.5.1 lifecycle_1.0.0 stringi_1.5.3
## [45] yaml_2.2.1 ggraph_2.0.5 MASS_7.3-53.1 plyr_1.8.6
## [49] qvalue_2.22.0 grid_4.0.3 blob_1.2.1 ggrepel_0.9.1
## [53] DO.db_2.9 crayon_1.4.1 lattice_0.20-41 graphlayouts_0.7.1
## [57] cowplot_1.1.1 splines_4.0.3 pillar_1.5.0 fgsea_1.16.0
## [61] igraph_1.2.6 reshape2_1.4.4 fastmatch_1.1-0 glue_1.4.2
## [65] evaluate_0.14 downloader_0.4 data.table_1.14.0 BiocManager_1.30.10
## [69] png_0.1-7 vctrs_0.3.6 tweenr_1.0.1 gtable_0.3.0
## [73] purrr_0.3.4 polyclip_1.10-0 tidyr_1.1.3 assertthat_0.2.1
## [77] cachem_1.0.4 xfun_0.21 ggforce_0.3.2 tidygraph_1.2.0
## [81] viridisLite_0.3.0 tibble_3.1.0 rvcheck_0.1.8 memoise_2.0.0
## [85] ellipsis_0.3.1