1 Overview

scRUtils provides various utilities for visualising and functional analysis of RNA-seq data, particularly single-cell dataset. It evolved from a collection of helper functions that were used in our in-house scRNA-seq processing workflow.

The documentation of this package is divided into 5 sections:

  1. Introduction to scRUtils
  2. General data visualisations
  3. Single-cell visualisations
  4. Markers and DEGs
  5. Demo datasets

This vignette (#3) will demonstrate functions specific for single-cell datasets.

2 Load packages

To use scRUtils and relevant packages in a R session, we load them using the library() command.

library(scRUtils)

library(cowplot)
library(ggplot2)
library(scater)

3 General visualisations

3.1 Plot expression of groups of cells in boxplot

The plotBox() function produces a boxplot to show the expression levels of selected genes in groups of cells. It requires a SingleCellExperiment object (argument sce) and features argument denoting the genes to plot. The group_by argument specify how the cells are to be grouped. If none were provided, all cells will be grouped into 1 group.

data(sce)

# All cells in 1 group
plotBox(sce, features = rownames(sce)[10:13])
## Loading required namespace: gtools

By default, the boxes are coloured by the fraction of cells in a group expressing the given gene. We can use color_by to colour by cell groups.

# Group cells by "label", colour by cell groups, and change theme base size
plotBox(sce, features = rownames(sce)[10:13], group_by = "label",
        color_by = "Group", theme_size = 14)

It is also possible to show the results in a subset of cell using the columns argument.

# Show cells of labels A and B only and change colour palette
keep <- sce$label %in% c("A","B")
plotBox(sce, features = rownames(sce)[10:13], group_by = "label", 
    box_colors = rainbow(50), columns = keep)

A maximum of 2 grouping terms can be provided in the group_by argument, and cells will be grouped accordingly. Given the number of possible cell groups increases when combining 2 terms, we can adjust the number of columns to show in the plot, the size and angle of the x-axis labels, and theme base size.

table(sce$label, sce$CellType)
##    
##     Type 1 Type 2 Type 3 Type 4 Type 5
##   A     49      0      0      0      1
##   B      0     55      0      0      0
##   C      0      0     38      1      0
##   D      0      0      0     46      0
##   E      0      0      1      0     58
# Group cells by "label" and "CellType", showing 2 features per row
plotBox(sce, features = rownames(sce)[10:15], 
    group_by = c("label","CellType"), box_colors = rainbow(50), 
    facet_ncol = 2, x.text_angle = 90, x.text_size = 14)

3.2 Plot G2M against G1 scores from cyclone results

The plotCyclone() function produces a scatter plot to show the predicted G1 and G2M scores and cell cycle phases of single cells after performing classification using cyclone() from the scran package.

data(phases_assignments)

plotCyclone(phases_assignments)

3.3 Plot expression frequency against mean counts

The plotExprsFreqVsMean() function uses plotRowData() from the scater package to produce a scatter plot of expression frequency against mean counts by using per- feature quality control metrics produced by addPerCellQC() or perFeatureQCMetrics() from the scuttle package and stored in rowData of the SingleCellExperiment input.

The original function of the same name can be found in the legacy scater package (archive-scater) and is now deprecated.

plotExprsFreqVsMean(sce, title = "Expression frequency vs. mean expression")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

3.4 Plot logcounts variance against logcounts mean

The plotVarianceVsMean() function calculates the per-feature mean and variance using normalised counts (accessible via logcounts()) from a SingleCellExperiment object, and returns a logcounts mean-variance scatter plot.

When the rowData slot of the SingleCellExperiment input contains QC metrics for features that were returned by addPerCellQC() or perFeatureQCMetrics(), this function will use detected (percentage of expressed features above the detection limit) to calculate pct_dropout (percentage of dropouts) and colour the points accordingly.

plotVarianceVsMean(sce, title = "logcounts mean-variance plot")

3.5 Plot variance modelling results

The plotVariableFeature() function requires a SingleCellExperiment object (argument sce) and the resulting DataFrame (argument var) after performing variance modelling, such as modelGeneVar() from the scran package, and produces a mean-variance scatter plot.

In addition to the two required inputs, if rowData(sce) has is_pass and/or is_ambient columns containing logical values (i.e. TRUE and FALSE), denoting if a gene passed QC or if its expression is attributable to ambient contamination, the resulting figure will show the genes in different shapes.

var <- metadata(sce)[["modelGeneVar"]]

plotVariableFeature(sce, var, top_n = 5, title = "modelGeneVar")

One can supply the hvg argument a character vector containing highly variable genes to be coloured in red.

hvg <- metadata(sce)[["HVG"]]

plotVariableFeature(sce, var, hvg, title = "modelGeneVar (highlight HVGs)")

3.6 Plot distribution of approximate silhouette widths

The plotSilhouette() function produces a violin scatter plot to show the distribution of the approximate silhouette width across cells in each cluster. It uses approxSilhouette() from the bluster package’ to calculate the silhouette widths using an approximate approach.

Using the information stored in the returned DataFrame, it:

  • prints the descriptive statistics of approximate silhouette widths.
  • calculates the percentage of cells assigned to the closest cluster and prints the result when printDiff = TRUE.
  • creates a violin scatter plot to show the distribution of the approximate silhouette width when plot = TRUE.

These information can allow users to identify poorly separate clusters quickly.

mat <- reducedDim(sce, "PCA")
plotSilhouette(mat, sce$label)
## Silhouette width summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.03618 0.07854 0.09546 0.09978 0.11718 0.18600 
## # A tibble: 5 × 2
##   cluster `% Different`
##   <chr>           <dbl>
## 1 A                   0
## 2 B                   0
## 3 C                   0
## 4 D                   0
## 5 E                   0

3.7 Create QC plots from findDoubletClusters results

The plotqcDoubletClusters function produces QC plots to aid deciding potential doublet clusters after running findDoubletClusters() from the scDblFinder package. By setting the qc_plot argument, it returns:

  1. The default, a compound figure.
  2. A scatter plot of median number of significant genes (median.de) against number of significant genes (num.de).
  3. A barplot showing the proportion of cells in each of the query cluster (prop).
  4. A scatter plot of ratio of the median library sizes for the second source cluster (lib.size2) against first source cluster (lib.size1).

The example below produces a compound figure.

data(dbl_results)

plotqcDoubletClusters(dbl_results, text_size = 5, theme_size = 16)

Or set qc_plot to 1 to produce just the median.de against num.de scatter plot.

plotqcDoubletClusters(dbl_results, qc_plot = 1)

4 UMAP/TSNE projection

4.1 Add labels to reduced dimension plots

The add_label() function is designed to be used with plotReducedDim() from the scater package without specifying text label (i.e. via text_by). Then use add_label() to place labels centrally. The repel-away-from-center behaviour in plotReducedDim() should be fixed in scater v1.23.5.

p1 <- plotReducedDim(sce, "TSNE", colour_by = "label", text_by = "label",
                     text_size = 8, theme_size = 16) + ggtitle("Original")
p2 <- plotReducedDim(sce, "TSNE", colour_by = "label", theme_size = 16) +
        add_label(sce, "TSNE") + ggtitle("Use `add_label`")

plot_grid(p1, p2)

4.2 Colour cells in a TSNE or UMAP

The plotProjection() function uses plotReducedDim() from the scater package to show cells on a pre-calculated low-dimensional projection (such as UMAP or t-SNE) and colour by a choosen cell-specific feature or level of gene expression.

The function adds aesthetic, such as point colours, legend controls, title and subtitles, to the final figure. It uses add_label() to label cells when text_by is used.

plotProjection(sce, "label", dimname = "TSNE", text_by = "label",
               feat_desc = "Cluster")

plotProjection(sce, "DKD62", dimname = "UMAP", text_by = "label",
               feat_desc = "DKD62 Expression", guides_barheight = 15)

4.3 Coloure cells in TSNE or UMAP side-by-side

The plotProjections() function make use of plotProjection() to show cells on two pre- calculated low-dimensional projection (such as UMAP and t-SNE) in a compound figure using plot_grid() from the cowplot package.

It also uses get_legend() from cowplot to produce a shared legend and places it at the desired position as specified by legend_pos.

plotProjections(sce, "label", dimname = c("TSNE", "UMAP"),
                text_by = "label", feat_desc = "Cluster", point_size = 2)

plotProjections(sce, "DKD62", dimname = c("TSNE", "UMAP"),
                text_by = "label", feat_desc = "DKD62 Expression",
                point_size = 2, legend_pos = "bottom", guides_barwidth = 15,
                rel_heights = c(10, 1))

4.4 Show ligand-receptor gene expression levels in a TSNE or UMAP

The plotReducedDimLR() function produces a figure to show the levels of gene expression of a ligand-receptor pair on a pre-calculated low-dimensional projection (such as UMAP or t-SNE).

The function is based on plotReducedDim() from the scater package, and uses new_scale_colour() from the ggnewscale package to add an additional layer where a second geom uses another colour scale to show the expression intensity of a second gene.

plotReducedDimLR(sce, "TSNE", c("OOX46", "BFP78"), text_by = "label")

Even though plotReducedDimLR() was designed initially to show ligand-receptor expression, by changing the lr_desc and lr_sep arguments, one can also use this function to show the expression of two genes, that could be expressed in a mutually exclusive fashion, or specific to certain cell clusters or cell types.

plotReducedDimLR(sce, "TSNE", c("OOX46", "BFP78"), lr_desc = c("B", "D"),
                 lr_sep = " and ", text_by = "label")

If it is too difficult to visualise the gene expression with two colour scales on the same figure, one can use oneplot = FALSE to create a compound figure with 2 sub-plots, each showing their respective colours.

plotReducedDimLR(sce, "TSNE", c("OOX46", "BFP78"),
                 lr_desc = c("Grp B", "Grp D"), lr_sep = " and ", 
         text_by = "label", guides_barheight = 10, theme_size = 16, 
         oneplot = FALSE)

Session information

sessionInfo()
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/test/lib/libopenblasp-r0.3.21.so
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] scater_1.22.0               scuttle_1.4.0              
##  [3] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
##  [5] Biobase_2.54.0              GenomicRanges_1.46.1       
##  [7] GenomeInfoDb_1.30.1         IRanges_2.28.0             
##  [9] S4Vectors_0.32.4            BiocGenerics_0.40.0        
## [11] MatrixGenerics_1.6.0        matrixStats_0.63.0         
## [13] ggplot2_3.4.2               cowplot_1.1.1              
## [15] scRUtils_0.2.0              BiocStyle_2.22.0           
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-162              bitops_1.0-7             
##  [3] httr_1.4.5                tools_4.1.3              
##  [5] bslib_0.4.2               utf8_1.2.3               
##  [7] R6_2.5.1                  irlba_2.3.5.1            
##  [9] vipor_0.4.5               mgcv_1.8-42              
## [11] colorspace_2.1-0          withr_2.5.0              
## [13] tidyselect_1.2.0          gridExtra_2.3            
## [15] curl_4.3.3                compiler_4.1.3           
## [17] cli_3.6.1                 BiocNeighbors_1.12.0     
## [19] enrichR_3.2               DelayedArray_0.20.0      
## [21] labeling_0.4.2            bookdown_0.33            
## [23] sass_0.4.5                scales_1.2.1             
## [25] digest_0.6.31             rmarkdown_2.21           
## [27] XVector_0.34.0            pkgconfig_2.0.3          
## [29] htmltools_0.5.5           WriteXLS_6.4.0           
## [31] sparseMatrixStats_1.6.0   highr_0.10               
## [33] limma_3.50.3              fastmap_1.1.1            
## [35] rlang_1.1.0               DelayedMatrixStats_1.16.0
## [37] jquerylib_0.1.4           farver_2.1.1             
## [39] generics_0.1.3            jsonlite_1.8.4           
## [41] gtools_3.9.4              BiocParallel_1.28.3      
## [43] dplyr_1.1.1               RCurl_1.98-1.12          
## [45] magrittr_2.0.3            BiocSingular_1.10.0      
## [47] GenomeInfoDbData_1.2.7    Matrix_1.5-4             
## [49] Rcpp_1.0.10               ggbeeswarm_0.7.1         
## [51] munsell_0.5.0             fansi_1.0.4              
## [53] ggnewscale_0.4.8          viridis_0.6.2            
## [55] lifecycle_1.0.3           edgeR_3.36.0             
## [57] yaml_2.3.7                MASS_7.3-58.3            
## [59] zlibbioc_1.40.0           grid_4.1.3               
## [61] dqrng_0.3.0               parallel_4.1.3           
## [63] ggrepel_0.9.3             lattice_0.21-8           
## [65] splines_4.1.3             beachmat_2.10.0          
## [67] magick_2.7.4              locfit_1.5-9.7           
## [69] metapod_1.2.0             knitr_1.42               
## [71] pillar_1.9.0              igraph_1.4.2             
## [73] rjson_0.2.21              ScaledMatrix_1.2.0       
## [75] glue_1.6.2                evaluate_0.20            
## [77] scran_1.22.1              BiocManager_1.30.20      
## [79] vctrs_0.6.1               tweenr_2.0.2             
## [81] gtable_0.3.3              purrr_1.0.1              
## [83] polyclip_1.10-4           tidyr_1.3.0              
## [85] cachem_1.0.7              xfun_0.38                
## [87] ggforce_0.4.1             rsvd_1.0.5               
## [89] viridisLite_0.4.1         tibble_3.2.1             
## [91] beeswarm_0.4.0            cluster_2.1.4            
## [93] statmod_1.5.0             bluster_1.4.0