scRUtils 0.2.0
scRUtils
provides various utilities for visualising and functional analysis of RNA-seq data,
particularly single-cell dataset. It evolved from a collection of helper functions that were
used in our in-house scRNA-seq processing workflow.
The documentation of this package is divided into 5 sections:
This vignette (#3) will demonstrate functions specific for single-cell datasets.
To use scRUtils
and relevant packages in a R session, we load them using the library()
command.
library(scRUtils)
library(cowplot)
library(ggplot2)
library(scater)
The plotBox()
function produces a boxplot to show the expression levels of selected genes
in groups of cells. It requires a SingleCellExperiment
object (argument sce
) and
features
argument denoting the genes to plot. The group_by
argument specify how the
cells are to be grouped. If none were provided, all cells will be grouped into 1 group.
data(sce)
# All cells in 1 group
plotBox(sce, features = rownames(sce)[10:13])
## Loading required namespace: gtools
By default, the boxes are coloured by the fraction of cells in a group expressing the given
gene. We can use color_by
to colour by cell groups.
# Group cells by "label", colour by cell groups, and change theme base size
plotBox(sce, features = rownames(sce)[10:13], group_by = "label",
color_by = "Group", theme_size = 14)
It is also possible to show the results in a subset of cell using the columns
argument.
# Show cells of labels A and B only and change colour palette
keep <- sce$label %in% c("A","B")
plotBox(sce, features = rownames(sce)[10:13], group_by = "label",
box_colors = rainbow(50), columns = keep)
A maximum of 2 grouping terms can be provided in the group_by
argument, and cells will
be grouped accordingly. Given the number of possible cell groups increases when combining
2 terms, we can adjust the number of columns to show in the plot, the size and angle of
the x-axis labels, and theme base size.
table(sce$label, sce$CellType)
##
## Type 1 Type 2 Type 3 Type 4 Type 5
## A 49 0 0 0 1
## B 0 55 0 0 0
## C 0 0 38 1 0
## D 0 0 0 46 0
## E 0 0 1 0 58
# Group cells by "label" and "CellType", showing 2 features per row
plotBox(sce, features = rownames(sce)[10:15],
group_by = c("label","CellType"), box_colors = rainbow(50),
facet_ncol = 2, x.text_angle = 90, x.text_size = 14)
cyclone
resultsThe plotCyclone()
function produces a scatter plot to show the predicted G1 and G2M scores
and cell cycle phases of single cells after performing classification using cyclone()
from
the scran package.
data(phases_assignments)
plotCyclone(phases_assignments)
The plotExprsFreqVsMean()
function uses plotRowData()
from the scater
package to produce a scatter plot of expression frequency against mean counts by using per-
feature quality control metrics produced by addPerCellQC()
or perFeatureQCMetrics()
from
the scuttle package and stored in rowData
of the SingleCellExperiment
input.
The original function of the same name can be found in the legacy scater package (archive-scater) and is now deprecated.
plotExprsFreqVsMean(sce, title = "Expression frequency vs. mean expression")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
The plotVarianceVsMean()
function calculates the per-feature mean and variance using
normalised counts (accessible via logcounts()
) from a SingleCellExperiment
object, and
returns a logcounts mean-variance scatter plot.
When the rowData
slot of the SingleCellExperiment
input contains QC metrics for features
that were returned by addPerCellQC()
or perFeatureQCMetrics()
, this function will use
detected
(percentage of expressed features above the detection limit) to calculate
pct_dropout
(percentage of dropouts) and colour the points accordingly.
plotVarianceVsMean(sce, title = "logcounts mean-variance plot")
The plotVariableFeature()
function requires a SingleCellExperiment
object (argument sce
)
and the resulting DataFrame (argument var
) after performing variance modelling, such as
modelGeneVar()
from the scran package, and produces a mean-variance scatter plot.
In addition to the two required inputs, if rowData(sce)
has is_pass
and/or is_ambient
columns containing logical values (i.e. TRUE
and FALSE
), denoting if a gene passed QC or if
its expression is attributable to ambient contamination, the resulting figure will show the genes
in different shapes.
var <- metadata(sce)[["modelGeneVar"]]
plotVariableFeature(sce, var, top_n = 5, title = "modelGeneVar")
One can supply the hvg
argument a character vector containing highly variable genes to be coloured
in red.
hvg <- metadata(sce)[["HVG"]]
plotVariableFeature(sce, var, hvg, title = "modelGeneVar (highlight HVGs)")
The plotSilhouette()
function produces a violin scatter plot to show the distribution of the
approximate silhouette width across cells in each cluster. It uses approxSilhouette()
from the
bluster package’ to calculate the silhouette widths using an approximate approach.
Using the information stored in the returned DataFrame, it:
printDiff = TRUE
.plot = TRUE
.These information can allow users to identify poorly separate clusters quickly.
mat <- reducedDim(sce, "PCA")
plotSilhouette(mat, sce$label)
## Silhouette width summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.03618 0.07854 0.09546 0.09978 0.11718 0.18600
## # A tibble: 5 × 2
## cluster `% Different`
## <chr> <dbl>
## 1 A 0
## 2 B 0
## 3 C 0
## 4 D 0
## 5 E 0
findDoubletClusters
resultsThe plotqcDoubletClusters
function produces QC plots to aid deciding potential doublet clusters
after running findDoubletClusters()
from the scDblFinder package. By setting the
qc_plot
argument, it returns:
median.de
) against number of significant genes (num.de
).prop
).lib.size2
) against first source cluster (lib.size1
).The example below produces a compound figure.
data(dbl_results)
plotqcDoubletClusters(dbl_results, text_size = 5, theme_size = 16)
Or set qc_plot
to 1 to produce just the median.de
against num.de
scatter plot.
plotqcDoubletClusters(dbl_results, qc_plot = 1)
The add_label()
function is designed to be used with plotReducedDim()
from the
scater package without specifying text label (i.e. via text_by
). Then use
add_label()
to place labels centrally. The repel-away-from-center behaviour in
plotReducedDim()
should be fixed in scater v1.23.5.
p1 <- plotReducedDim(sce, "TSNE", colour_by = "label", text_by = "label",
text_size = 8, theme_size = 16) + ggtitle("Original")
p2 <- plotReducedDim(sce, "TSNE", colour_by = "label", theme_size = 16) +
add_label(sce, "TSNE") + ggtitle("Use `add_label`")
plot_grid(p1, p2)
The plotProjection()
function uses plotReducedDim()
from the scater package
to show cells on a pre-calculated low-dimensional projection (such as UMAP or t-SNE) and colour
by a choosen cell-specific feature or level of gene expression.
The function adds aesthetic, such as point colours, legend controls, title and subtitles, to the
final figure. It uses add_label()
to label cells when text_by
is used.
plotProjection(sce, "label", dimname = "TSNE", text_by = "label",
feat_desc = "Cluster")
plotProjection(sce, "DKD62", dimname = "UMAP", text_by = "label",
feat_desc = "DKD62 Expression", guides_barheight = 15)
The plotProjections()
function make use of plotProjection()
to show cells on two pre-
calculated low-dimensional projection (such as UMAP and t-SNE) in a compound figure using
plot_grid()
from the cowplot package.
It also uses get_legend()
from cowplot to produce a shared legend and places
it at the desired position as specified by legend_pos
.
plotProjections(sce, "label", dimname = c("TSNE", "UMAP"),
text_by = "label", feat_desc = "Cluster", point_size = 2)
plotProjections(sce, "DKD62", dimname = c("TSNE", "UMAP"),
text_by = "label", feat_desc = "DKD62 Expression",
point_size = 2, legend_pos = "bottom", guides_barwidth = 15,
rel_heights = c(10, 1))
The plotReducedDimLR()
function produces a figure to show the levels of gene expression of a
ligand-receptor pair on a pre-calculated low-dimensional projection (such as UMAP or t-SNE).
The function is based on plotReducedDim()
from the scater package, and uses
new_scale_colour()
from the ggnewscale package to add an additional layer
where
a second geom
uses another colour scale to show the expression intensity of a second gene.
plotReducedDimLR(sce, "TSNE", c("OOX46", "BFP78"), text_by = "label")
Even though plotReducedDimLR()
was designed initially to show ligand-receptor expression, by
changing the lr_desc
and lr_sep
arguments, one can also use this function to show the
expression of two genes, that could be expressed in a mutually exclusive fashion, or specific
to certain cell clusters or cell types.
plotReducedDimLR(sce, "TSNE", c("OOX46", "BFP78"), lr_desc = c("B", "D"),
lr_sep = " and ", text_by = "label")
If it is too difficult to visualise the gene expression with two colour scales on the same figure,
one can use oneplot = FALSE
to create a compound figure with 2 sub-plots, each showing their
respective colours.
plotReducedDimLR(sce, "TSNE", c("OOX46", "BFP78"),
lr_desc = c("Grp B", "Grp D"), lr_sep = " and ",
text_by = "label", guides_barheight = 10, theme_size = 16,
oneplot = FALSE)
sessionInfo()
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/test/lib/libopenblasp-r0.3.21.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scater_1.22.0 scuttle_1.4.0
## [3] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
## [5] Biobase_2.54.0 GenomicRanges_1.46.1
## [7] GenomeInfoDb_1.30.1 IRanges_2.28.0
## [9] S4Vectors_0.32.4 BiocGenerics_0.40.0
## [11] MatrixGenerics_1.6.0 matrixStats_0.63.0
## [13] ggplot2_3.4.2 cowplot_1.1.1
## [15] scRUtils_0.2.0 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-162 bitops_1.0-7
## [3] httr_1.4.5 tools_4.1.3
## [5] bslib_0.4.2 utf8_1.2.3
## [7] R6_2.5.1 irlba_2.3.5.1
## [9] vipor_0.4.5 mgcv_1.8-42
## [11] colorspace_2.1-0 withr_2.5.0
## [13] tidyselect_1.2.0 gridExtra_2.3
## [15] curl_4.3.3 compiler_4.1.3
## [17] cli_3.6.1 BiocNeighbors_1.12.0
## [19] enrichR_3.2 DelayedArray_0.20.0
## [21] labeling_0.4.2 bookdown_0.33
## [23] sass_0.4.5 scales_1.2.1
## [25] digest_0.6.31 rmarkdown_2.21
## [27] XVector_0.34.0 pkgconfig_2.0.3
## [29] htmltools_0.5.5 WriteXLS_6.4.0
## [31] sparseMatrixStats_1.6.0 highr_0.10
## [33] limma_3.50.3 fastmap_1.1.1
## [35] rlang_1.1.0 DelayedMatrixStats_1.16.0
## [37] jquerylib_0.1.4 farver_2.1.1
## [39] generics_0.1.3 jsonlite_1.8.4
## [41] gtools_3.9.4 BiocParallel_1.28.3
## [43] dplyr_1.1.1 RCurl_1.98-1.12
## [45] magrittr_2.0.3 BiocSingular_1.10.0
## [47] GenomeInfoDbData_1.2.7 Matrix_1.5-4
## [49] Rcpp_1.0.10 ggbeeswarm_0.7.1
## [51] munsell_0.5.0 fansi_1.0.4
## [53] ggnewscale_0.4.8 viridis_0.6.2
## [55] lifecycle_1.0.3 edgeR_3.36.0
## [57] yaml_2.3.7 MASS_7.3-58.3
## [59] zlibbioc_1.40.0 grid_4.1.3
## [61] dqrng_0.3.0 parallel_4.1.3
## [63] ggrepel_0.9.3 lattice_0.21-8
## [65] splines_4.1.3 beachmat_2.10.0
## [67] magick_2.7.4 locfit_1.5-9.7
## [69] metapod_1.2.0 knitr_1.42
## [71] pillar_1.9.0 igraph_1.4.2
## [73] rjson_0.2.21 ScaledMatrix_1.2.0
## [75] glue_1.6.2 evaluate_0.20
## [77] scran_1.22.1 BiocManager_1.30.20
## [79] vctrs_0.6.1 tweenr_2.0.2
## [81] gtable_0.3.3 purrr_1.0.1
## [83] polyclip_1.10-4 tidyr_1.3.0
## [85] cachem_1.0.7 xfun_0.38
## [87] ggforce_0.4.1 rsvd_1.0.5
## [89] viridisLite_0.4.1 tibble_3.2.1
## [91] beeswarm_0.4.0 cluster_2.1.4
## [93] statmod_1.5.0 bluster_1.4.0