16S-rDNA-V3-V4 repository: https://github.com/ycl6/16S-rDNA-V3-V4

picrust2: GitHub, Documentation, Paper

ALDEx2: GitHub, Documentation, Paper

Demo Dataset: PRJEB27564 from Gut Microbiota in Parkinson’s Disease: Temporal Stability and Relations to Disease Progression. EBioMedicine. 2019;44:691-707

License: GPL-3.0

Workflow (Continues from Part 3)

Run `picrust2`

We will execute the picrust2 pipeline script in the terminal/console. If you have used conda to create an environment for picrust2, for example an environment called picrust2, you can activate it now to run both programs.

Note: Use conda deactivate to deactivate from a currently active conda environment if required.

Requires fasta file expr.asv.fasta and biom file expr.biom generated in Part 2: phyloseq
The --output argument to specify the output folder for final files
The --processes N argument to specify the number of CPUs to run picrust2 in parallel

conda activate picrust2

picrust2_pipeline.py --study_fasta outfiles/expr.asv.fasta --input outfiles/expr.biom \
    --output picrust2_out_stratified --processes 6 --stratified --remove_intermediate --verbose

Completed PICRUSt2 pipeline in 2301.59 seconds.

Note: Use conda deactivate to deactivate from a currently active conda environment if required.

Locate `picrust2` mapfiles

Use the Linux locate commandto locate the PATH that keeps the mapfiles.

locate description_mapfiles

Example output:

/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/KEGG_modules_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/KEGG_pathways_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/cog_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/ec_level4_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/ko_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/metacyc_pathways_info.txt.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/pfam_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/tigrfam_info.tsv.gz

Start `R`

cd /ngs/16S-Demo/PRJEB27564

R

Load package and set path

library("data.table")   # Also requires R.utils to read gz and bz2 files
library("phyloseq")
library("ALDEx2")
library("tidyverse")

Set `picrust2` output folder

Full path is not necessary if R is executed in the folder one-level above picrust2_out_stratified.

picrust2 = "picrust2_out_stratified"

Use list.files to show files stored in the folder.

list.files(picrust2, recursive = TRUE)

##  [1] "EC_metagenome_out/pred_metagenome_contrib.tsv.gz"
##  [2] "EC_metagenome_out/pred_metagenome_unstrat.tsv.gz"
##  [3] "EC_metagenome_out/seqtab_norm.tsv.gz"            
##  [4] "EC_metagenome_out/weighted_nsti.tsv.gz"          
##  [5] "EC_predicted.tsv.gz"                             
##  [6] "KO_metagenome_out/pred_metagenome_contrib.tsv.gz"
##  [7] "KO_metagenome_out/pred_metagenome_unstrat.tsv.gz"
##  [8] "KO_metagenome_out/seqtab_norm.tsv.gz"            
##  [9] "KO_metagenome_out/weighted_nsti.tsv.gz"          
## [10] "KO_predicted.tsv.gz"                             
## [11] "marker_predicted_and_nsti.tsv.gz"                
## [12] "out.tre"                                         
## [13] "pathways_out/path_abun_contrib.tsv.gz"           
## [14] "pathways_out/path_abun_unstrat.tsv.gz"

Build `picrust2` output file paths

p2_EC = paste0(picrust2, "/EC_metagenome_out/pred_metagenome_unstrat.tsv.gz")
p2_KO = paste0(picrust2, "/KO_metagenome_out/pred_metagenome_unstrat.tsv.gz")
p2_PW = paste0(picrust2, "/pathways_out/path_abun_unstrat.tsv.gz")

Set `picrust2` mapfile folder

Use the “description_mapfiles” PATH you located with the locate command above.

mapfile = "/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles"

Use list.files to show files stored in the folder.

list.files(mapfile, recursive = TRUE)

## [1] "cog_info.tsv.gz"              "ec_level4_info.tsv.gz"       
## [3] "KEGG_modules_info.tsv.gz"     "KEGG_pathways_info.tsv.gz"   
## [5] "ko_info.tsv.gz"               "metacyc_pathways_info.txt.gz"
## [7] "pfam_info.tsv.gz"             "tigrfam_info.tsv.gz"

Build `picrust2` map file paths

mapfile_EC = paste0(mapfile, "/ec_level4_info.tsv.gz")
mapfile_KO = paste0(mapfile, "/ko_info.tsv.gz")
mapfile_PW = paste0(mapfile, "/metacyc_pathways_info.txt.gz")

Load saved workspace from Part 3

load("3_lefse_tutorial.RData")

Prepare input data

Load map files

mapEC = as.data.frame(fread(mapfile_EC, header = FALSE))
colnames(mapEC) = c("function","description")
mapKO = as.data.frame(fread(mapfile_KO, header = FALSE, sep = "\t"))
colnames(mapKO) = c("function","description")
mapPW = as.data.frame(fread(mapfile_PW, header = FALSE))
colnames(mapPW) = c("pathway","description")

EC map file:

KO map file:

Pathway map file:

Load `picrust2` output files

p2EC = as.data.frame(fread(p2_EC))
rownames(p2EC) = p2EC$"function"
p2EC = as.matrix(p2EC[,-1])
p2EC = round(p2EC)

p2KO = as.data.frame(fread(p2_KO))
rownames(p2KO) = p2KO$"function"
p2KO = as.matrix(p2KO[,-1])
p2KO = round(p2KO)

p2PW = as.data.frame(fread(p2_PW))
rownames(p2PW) = p2PW$"pathway"
p2PW = as.matrix(p2PW[,-1])
p2PW = round(p2PW)

picrust2 EC output:

picrust2 KO output:

picrust2 Pathway output:

Subset results

Subset-1: Patient vs. control at baseline
Subset-2: Patient followup vs. patient baseline

# Subset-1
p2EC1 = p2EC[,sample_names(ps1a)]
p2KO1 = p2KO[,sample_names(ps1a)]
p2PW1 = p2PW[,sample_names(ps1a)]

# Subset-2
p2EC2 = p2EC[,sample_names(ps1b)]
p2KO2 = p2KO[,sample_names(ps1b)]
p2PW2 = p2PW[,sample_names(ps1b)]

Perform statistical analysis

We use the ANOVA-like differential expression (ALDEx2) compositional data analysis (CoDA) method to perform differential abundance testing between 2 groups/conditions

If the test is ‘glm’, then effect should be FALSE. The ‘glm’ option evaluates the data as a one-way ANOVA using the glm and Kruskal-Wallace test
If the test is ‘t’, then effect should be set to TRUE. The ‘t’ option evaluates the data as a two-factor experiment using both the Welch’s t and the Wilcoxon rank tests
All tests include a Benjamini-Hochberg correction of the raw P values

The mc.samples argument defines the number of Monte Carlo samples to use to estimate the underlying distributions. The default is 128

On Subset-1: Patient vs. control at baseline

# EC
set.seed(12345)
system.time({
        aldex2_EC1 = aldex(p2EC1, sample_data(ps1a)$Group, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})

## aldex.clr: generating Monte-Carlo instances and clr values

## operating in serial mode

## removed rows with sums equal to zero

## computing iqlr centering

## data format is OK

## dirichlet samples complete

## transformation complete

## aldex.ttest: doing t-test

## running tests for each MC instance:

## |------------(25%)----------(50%)----------(75%)----------|

## aldex.effect: calculating effect sizes

## operating in serial mode

## sanity check complete

## rab.all  complete

## rab.win  complete

## rab of samples complete

## within sample difference calculated

## between group difference calculated

## group summaries calculated

## effect size calculated

## summarizing output

##    user  system elapsed 
## 103.368   5.815 109.191

# KO
set.seed(12345)
system.time({
        aldex2_KO1 = aldex(p2KO1, sample_data(ps1a)$Group, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})

## aldex.clr: generating Monte-Carlo instances and clr values

## operating in serial mode

## removed rows with sums equal to zero

## computing iqlr centering

## data format is OK

## dirichlet samples complete

## transformation complete

## aldex.ttest: doing t-test

## running tests for each MC instance:

## |------------(25%)----------(50%)----------(75%)----------|

## aldex.effect: calculating effect sizes

## operating in serial mode

## sanity check complete

## rab.all  complete

## rab.win  complete

## rab of samples complete

## within sample difference calculated

## between group difference calculated

## group summaries calculated

## effect size calculated

## summarizing output

##    user  system elapsed 
## 316.211  17.531 333.766

# Pathway
set.seed(12345)
system.time({
        aldex2_PW1 = aldex(p2PW1, sample_data(ps1a)$Group, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})

## aldex.clr: generating Monte-Carlo instances and clr values

## operating in serial mode

## removed rows with sums equal to zero

## computing iqlr centering

## data format is OK

## dirichlet samples complete

## transformation complete

## aldex.ttest: doing t-test

## running tests for each MC instance:

## |------------(25%)----------(50%)----------(75%)----------|

## aldex.effect: calculating effect sizes

## operating in serial mode

## sanity check complete

## rab.all  complete

## rab.win  complete

## rab of samples complete

## within sample difference calculated

## between group difference calculated

## group summaries calculated

## effect size calculated

## summarizing output

##    user  system elapsed 
##  19.054   0.184  19.239

On Subset-2: Patient followup vs. patient baseline

# EC
set.seed(12345)
system.time({
        aldex2_EC2 = aldex(p2EC2, sample_data(ps1b)$Time, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})

## aldex.clr: generating Monte-Carlo instances and clr values

## operating in serial mode

## removed rows with sums equal to zero

## computing iqlr centering

## data format is OK

## dirichlet samples complete

## transformation complete

## aldex.ttest: doing t-test

## running tests for each MC instance:

## |------------(25%)----------(50%)----------(75%)----------|

## aldex.effect: calculating effect sizes

## operating in serial mode

## sanity check complete

## rab.all  complete

## rab.win  complete

## rab of samples complete

## within sample difference calculated

## between group difference calculated

## group summaries calculated

## effect size calculated

## summarizing output

##    user  system elapsed 
##  95.867   3.967  99.841

# KO
set.seed(12345)
system.time({
        aldex2_KO2 = aldex(p2KO2, sample_data(ps1b)$Time, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})

## aldex.clr: generating Monte-Carlo instances and clr values

## operating in serial mode

## removed rows with sums equal to zero

## computing iqlr centering

## data format is OK

## dirichlet samples complete

## transformation complete

## aldex.ttest: doing t-test

## running tests for each MC instance:

## |------------(25%)----------(50%)----------(75%)----------|

## aldex.effect: calculating effect sizes

## operating in serial mode

## sanity check complete

## rab.all  complete

## rab.win  complete

## rab of samples complete

## within sample difference calculated

## between group difference calculated

## group summaries calculated

## effect size calculated

## summarizing output

##    user  system elapsed 
## 313.440   7.736 321.204

# Pathway
set.seed(12345)
system.time({
        aldex2_PW2 = aldex(p2PW2, sample_data(ps1b)$Time, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})

## aldex.clr: generating Monte-Carlo instances and clr values

## operating in serial mode

## removed rows with sums equal to zero

## computing iqlr centering

## data format is OK

## dirichlet samples complete

## transformation complete

## aldex.ttest: doing t-test

## running tests for each MC instance:

## |------------(25%)----------(50%)----------(75%)----------|

## aldex.effect: calculating effect sizes

## operating in serial mode

## sanity check complete

## rab.all  complete

## rab.win  complete

## rab of samples complete

## within sample difference calculated

## between group difference calculated

## group summaries calculated

## effect size calculated

## summarizing output

##    user  system elapsed 
##  19.085   0.000  19.086

See a ALDEx2 output

We can use head to view the top N rows of the data stored in the data.frame. Below we view the top 10 rows in aldex2_EC1:

head(aldex2_EC1, 10)

Check estimated effect size

ALDEx2 authors suggest that an effect size of 1 or greater can be used as significance cutoff

On Subset-1: Patient vs. control at baseline

None of the metabolic predictions show differential abundances between Parkinson’s patients and control subjects at baseline

png("images/ALDEx2_picrust2_effect_1.png", width = 6, height = 6, units = "in", res = 300)
par(mfrow = c(2,2))
hist(aldex2_EC1$effect, breaks = 20, xlab = "effect size", col = "yellow", main = "EC")
hist(aldex2_KO1$effect, breaks = 20, xlab = "effect size", col = "yellow", main = "KO")
hist(aldex2_PW1$effect, breaks = 20, xlab = "effect size", col = "yellow", main = "Pathway")
invisible(dev.off())

“images/ALDEx2_picrust2_effect_1.png”

On Subset-2: Patient followup vs. patient baseline

None of the metabolic predictions show differential abundances between Parkinson’s patients at baseline and at followup

png("images/ALDEx2_picrust2_effect_2.png", width = 6, height = 6, units = "in", res = 300)
par(mfrow = c(2,2))
hist(aldex2_EC2$effect, breaks = 20, xlab = "effect size", col = "yellow", main = "EC")
hist(aldex2_KO2$effect, breaks = 20, xlab = "effect size", col = "yellow", main = "KO")
hist(aldex2_PW2$effect, breaks = 20, xlab = "effect size", col = "yellow", main = "Pathway")
invisible(dev.off())

“images/ALDEx2_picrust2_effect_2.png”

Plotting of outputs

Create MW and MA plots

Use aldex.plot function to create MW (fold-change to variance/effect) and MA (Bland-Altman) plots

The plot below shows the relationship between effect size and P values and BH-adjusted P values in the test dataset.

On Subset-1: Patient vs. control at baseline

png("images/ALDEx2_picrust2_MW_MA_1.png", width = 6, height = 8, units = "in", res = 300)
par(mfrow = c(3,2))
aldex.plot(aldex2_EC1, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(EC) MW Plot")

aldex.plot(aldex2_EC1, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Log-ratio abundance", ylab = "Difference")
title(main = "(EC) MA Plot")

aldex.plot(aldex2_KO1, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(KO) MW Plot")

aldex.plot(aldex2_KO1, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Relative abundance", ylab = "Difference")
title(main = "(KO) MA Plot")

aldex.plot(aldex2_PW1, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(PW) MW Plot")

aldex.plot(aldex2_PW1, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Relative abundance", ylab = "Difference")
title(main = "(PW) MA Plot")
invisible(dev.off())

“images/ALDEx2_picrust2_MW_MA_1.png”

On Subset-2: Patient followup vs. patient baseline

png("images/ALDEx2_picrust2_MW_MA_2.png", width = 6, height = 8, units = "in", res = 300)
par(mfrow = c(3,2))
aldex.plot(aldex2_EC2, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(EC) MW Plot")

aldex.plot(aldex2_EC2, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Relative abundance", ylab = "Difference")
title(main = "(EC) MA Plot")

aldex.plot(aldex2_KO2, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(KO) MW Plot")

aldex.plot(aldex2_KO2, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Relative abundance", ylab = "Difference")
title(main = "(KO) MA Plot")

aldex.plot(aldex2_PW2, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(PW) MW Plot")

aldex.plot(aldex2_PW2, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Relative abundance", ylab = "Difference")
title(main = "(PW) MA Plot")
invisible(dev.off())

“images/ALDEx2_picrust2_MW_MA_2.png”

Relationship between effect, difference, and P values

On Subset-1: Patient vs. control at baseline

png("images/ALDEx2_picrust2_P_adjP_1.png", width = 6, height = 8, units = "in", res = 300)
par(mfrow = c(3,2))
plot(aldex2_EC1$effect, aldex2_EC1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(EC) Effect size plot")
points(aldex2_EC1$effect, aldex2_EC1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_EC1$diff.btw, aldex2_EC1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(EC) Volcano plot")
points(aldex2_EC1$diff.btw, aldex2_EC1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")

plot(aldex2_KO1$effect, aldex2_KO1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(KO) Effect size plot")
points(aldex2_KO1$effect, aldex2_KO1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_KO1$diff.btw, aldex2_KO1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(KO) Volcano plot")
points(aldex2_KO1$diff.btw, aldex2_KO1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")

plot(aldex2_PW1$effect, aldex2_PW1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(PW) Effect size plot")
points(aldex2_PW1$effect, aldex2_PW1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_PW1$diff.btw, aldex2_PW1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(PW) Volcano plot")
points(aldex2_PW1$diff.btw, aldex2_PW1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
invisible(dev.off())

“images/ALDEx2_picrust2_P_adjP_1.png”

On Subset-2: Patient followup vs. patient baseline

png("images/ALDEx2_picrust2_P_adjP_2.png", width = 6, height = 8, units = "in", res = 300)
par(mfrow = c(3,2))
plot(aldex2_EC2$effect, aldex2_EC2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(EC) Effect size plot")
points(aldex2_EC2$effect, aldex2_EC2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_EC2$diff.btw, aldex2_EC2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(EC) Volcano plot")
points(aldex2_EC2$diff.btw, aldex2_EC2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")

plot(aldex2_KO2$effect, aldex2_KO2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(KO) Effect size plot")
points(aldex2_KO2$effect, aldex2_KO2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_KO2$diff.btw, aldex2_KO2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(KO) Volcano plot")
points(aldex2_KO2$diff.btw, aldex2_KO2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")

plot(aldex2_PW2$effect, aldex2_PW2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(PW) Effect size plot")
points(aldex2_PW2$effect, aldex2_PW2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_PW2$diff.btw, aldex2_PW2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(PW) Volcano plot")
points(aldex2_PW2$diff.btw, aldex2_PW2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
invisible(dev.off())

“images/ALDEx2_picrust2_P_adjP_2.png”

Create output files

Merge with map file data

# Subset-1
df_EC1 = aldex2_EC1 %>% tibble::rownames_to_column(var = "EC") %>% 
    inner_join(mapEC, by = c("EC" = "function")) %>% arrange(EC)

df_KO1 = aldex2_KO1 %>% tibble::rownames_to_column(var = "KO") %>% 
    inner_join(mapKO, by = c("KO" = "function")) %>% arrange(KO)

df_PW1 = aldex2_PW1 %>% tibble::rownames_to_column(var = "Pathway") %>% 
    inner_join(mapPW, by = c("Pathway" = "pathway")) %>% arrange(Pathway)

# Subset-2
df_EC2 = aldex2_EC2 %>% tibble::rownames_to_column(var = "EC") %>%
        inner_join(mapEC, by = c("EC" = "function")) %>% arrange(EC)

df_KO2 = aldex2_KO2 %>% tibble::rownames_to_column(var = "KO") %>%
        inner_join(mapKO, by = c("KO" = "function")) %>% arrange(KO)

df_PW2 = aldex2_PW2 %>% tibble::rownames_to_column(var = "Pathway") %>%
        inner_join(mapPW, by = c("Pathway" = "pathway")) %>% arrange(Pathway)

Output to file

# Subset-1
write.table(df_EC1, file = "outfiles/ALDEx2_picrust2_EC_results_1.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)
write.table(df_KO1, file = "outfiles/ALDEx2_picrust2_KO_results_1.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)
write.table(df_PW1, file = "outfiles/ALDEx2_picrust2_Pathway_results_1.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)

# Subset-2
write.table(df_EC2, file = "outfiles/ALDEx2_picrust2_EC_results_2.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)
write.table(df_KO2, file = "outfiles/ALDEx2_picrust2_KO_results_2.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)
write.table(df_PW2, file = "outfiles/ALDEx2_picrust2_Pathway_results_2.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)

Save current workspace

save.image(file = "4_picrust2_tutorial.RData")

Session information

sessionInfo()

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3_4.10.0/envs/r4-16S/lib/libopenblasp-r0.3.15.so
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
##  [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] forcats_0.5.1       stringr_1.4.0       dplyr_1.0.7         purrr_0.3.4        
##  [5] readr_1.4.0         tidyr_1.1.3         tibble_3.1.2        ggplot2_3.3.4      
##  [9] tidyverse_1.3.1     ALDEx2_1.24.0       Rfast_2.0.3         RcppZiggurat_0.1.6 
## [13] Rcpp_1.0.6          zCompositions_1.3.4 truncnorm_1.0-8     NADA_1.6-1.1       
## [17] survival_3.2-11     MASS_7.3-54         phyloseq_1.36.0     data.table_1.14.0  
## [21] knitr_1.33         
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1                backports_1.2.1             plyr_1.8.6                 
##   [4] igraph_1.2.6                splines_4.1.0               BiocParallel_1.26.0        
##   [7] GenomeInfoDb_1.28.0         digest_0.6.27               foreach_1.5.1              
##  [10] htmltools_0.5.1.1           fansi_0.4.2                 magrittr_2.0.1             
##  [13] memoise_2.0.0               cluster_2.1.2               Biostrings_2.60.0          
##  [16] annotate_1.70.0             modelr_0.1.8                RcppParallel_5.1.4         
##  [19] matrixStats_0.59.0          R.utils_2.10.1              jpeg_0.1-8.1               
##  [22] colorspace_2.0-2            blob_1.2.1                  rvest_1.0.0                
##  [25] haven_2.4.1                 xfun_0.24                   crayon_1.4.1               
##  [28] RCurl_1.98-1.3              jsonlite_1.7.2              genefilter_1.74.0          
##  [31] dada2_1.20.0                iterators_1.0.13            ape_5.5                    
##  [34] glue_1.4.2                  gtable_0.3.0                zlibbioc_1.38.0            
##  [37] XVector_0.32.0              DelayedArray_0.18.0         Rhdf5lib_1.14.0            
##  [40] BiocGenerics_0.38.0         scales_1.1.1                DBI_1.1.1                  
##  [43] xtable_1.8-4                bit_4.0.4                   stats4_4.1.0               
##  [46] httr_1.4.2                  RColorBrewer_1.1-2          ellipsis_0.3.2             
##  [49] pkgconfig_2.0.3             XML_3.99-0.6                R.methodsS3_1.8.1          
##  [52] dbplyr_2.1.1                locfit_1.5-9.4              utf8_1.2.1                 
##  [55] tidyselect_1.1.1            rlang_0.4.11                reshape2_1.4.4             
##  [58] AnnotationDbi_1.54.0        munsell_0.5.0               cellranger_1.1.0           
##  [61] tools_4.1.0                 cachem_1.0.5                cli_2.5.0                  
##  [64] generics_0.1.0              RSQLite_2.2.5               ade4_1.7-17                
##  [67] broom_0.7.8                 evaluate_0.14               biomformat_1.20.0          
##  [70] fastmap_1.1.0               yaml_2.2.1                  bit64_4.0.5                
##  [73] fs_1.5.0                    KEGGREST_1.32.0             nlme_3.1-152               
##  [76] R.oo_1.24.0                 xml2_1.3.2                  compiler_4.1.0             
##  [79] rstudioapi_0.13             png_0.1-7                   reprex_2.0.0               
##  [82] geneplotter_1.70.0          stringi_1.6.2               highr_0.9                  
##  [85] ps_1.6.0                    lattice_0.20-44             Matrix_1.3-4               
##  [88] vegan_2.5-7                 permute_0.9-5               multtest_2.48.0            
##  [91] vctrs_0.3.8                 pillar_1.6.1                lifecycle_1.0.0            
##  [94] rhdf5filters_1.4.0          bitops_1.0-7                GenomicRanges_1.44.0       
##  [97] R6_2.5.0                    latticeExtra_0.6-29         hwriter_1.3.2              
## [100] ShortRead_1.50.0            IRanges_2.26.0              codetools_0.2-18           
## [103] assertthat_0.2.1            rhdf5_2.36.0                SummarizedExperiment_1.22.0
## [106] DESeq2_1.32.0               withr_2.4.2                 GenomicAlignments_1.28.0   
## [109] Rsamtools_2.8.0             S4Vectors_0.30.0            GenomeInfoDbData_1.2.6     
## [112] mgcv_1.8-36                 parallel_4.1.0              hms_1.1.0                  
## [115] grid_4.1.0                  rmarkdown_2.9               MatrixGenerics_1.4.0       
## [118] Biobase_2.52.0              lubridate_1.7.10

16S rDNA amplicon sequencing analysis using R (Part 4: picrust2 & ALDEx2)

I-Hsuan Lin

July 19, 2021

Workflow (Continues from Part 3)

Run `picrust2`

Locate `picrust2` mapfiles

Start `R`

Load package and set path

Set `picrust2` output folder

Build `picrust2` output file paths

Set `picrust2` mapfile folder

Build `picrust2` map file paths

Load saved workspace from Part 3

Prepare input data

Load map files

Load `picrust2` output files

Subset results

Perform statistical analysis

See a ALDEx2 output

Check estimated effect size

Plotting of outputs

Create MW and MA plots

Relationship between effect, difference, and P values

Create output files

Merge with map file data

Output to file

Save current workspace

Session information

16S rDNA amplicon sequencing analysis using R (Part 4: picrust2 & ALDEx2)

I-Hsuan Lin

July 19, 2021

Workflow (Continues from Part 3)

Run picrust2

Locate picrust2 mapfiles

Start R

Load package and set path

Set picrust2 output folder

Build picrust2 output file paths

Set picrust2 mapfile folder

Build picrust2 map file paths

Load saved workspace from Part 3

Prepare input data

Load map files

Load picrust2 output files

Subset results

Perform statistical analysis

See a ALDEx2 output

Check estimated effect size

Plotting of outputs

Create MW and MA plots

Relationship between effect, difference, and P values

Create output files

Merge with map file data

Output to file

Save current workspace

Session information

Run `picrust2`

Locate `picrust2` mapfiles

Start `R`

Set `picrust2` output folder

Build `picrust2` output file paths

Set `picrust2` mapfile folder

Build `picrust2` map file paths

Load `picrust2` output files