1 Overview

scRUtils provides various utilities for visualising and functional analysis of RNA-seq data, particularly single-cell dataset. It evolved from a collection of helper functions that were used in our in-house scRNA-seq processing workflow.

The documentation of this package is divided into 5 sections:

  1. Introduction to scRUtils
  2. General data visualisations
  3. Single-cell visualisations
  4. Markers and DEGs
  5. Demo datasets

This vignette (#2) will demonstrate functions for general visualisation purposes.

2 Load packages

To use scRUtils and relevant packages in a R session, we load them using the library() command.

library(scRUtils)

library(ggforce)
library(ggplot2)
library(scater)

3 Usage

3.1 Jupyter Notebook-specific

3.1.1 Change repr.plot.* behaviour

The fig(), function uses options() to change the behaviour of repr.plot.*. It provides a quick and easy way to change plot size and other repr.plot.* behaviours when running R in a Jupyter Notebook. If use without indicating any argument, the plot behaviour will be reset to default. The reset.fig() is an alias of fig().

The example below is not evaluated as the function has no effect in R Markdown.

library(ggplot2)

# Change plot area width to 8 inches and height to 5 inches
fig(width = 8, height = 5)
ggplot(mpg, aes(class)) + geom_bar()

# Reset to default settings
fig()

# Change plot wider and taller
fig(width = 14, height = 10)
ggplot(mpg, aes(class)) + geom_bar()

# Alias of fig()
reset.fig()

3.2 Discrete palette

3.2.1 The c30() and c40() palettes

The c30() palette has 30 unique colours and c40() palette has 40 unique colours. The c40() colour palette is taken from plotScoreHeatmap() of the SingleR package (which itself is based on and Okabe-Ito colors).

# Show colours as pie charts
pie(rep(1,30), col = c30(), radius = 1.05)

pie(rep(1,40), col = c40(), radius = 1.05)

3.2.2 Choose discrete colours

The choosePalette() function takes a character vector of features and optionally a vector of color codes to evaluate if the supplied color codes has sufficient number of colours. It returns a named vector of color codes based on the input features, with the same length as the unique features.

By default, it uses the c30() palette when no more than 30 colours are required, then the c40() palette, and lastly the rainbow() colour palette when requiring more than 40 colours.

The example below shows using a character vector of 10 letters as input and choosePalette() returns 5 colours.

feat <- rep(LETTERS[1:5], 2)
feat
##  [1] "A" "B" "C" "D" "E" "A" "B" "C" "D" "E"
choosePalette(feat) # use c30()
## Loading required namespace: gtools
##         A         B         C         D         E 
## "#006400" "#ff0000" "#0000ff" "#ff8c00" "#800080"

Next example shows using a factor of 15 letters and 3 levels as input and choosePalette() returns 3 of the 10 colours from the rainbow(10) colour palette.

feat <- factor(rep(LETTERS[1:3], 5))
feat
##  [1] A B C A B C A B C A B C A B C
## Levels: A B C
choosePalette(feat, rainbow(10))
##         A         B         C 
## "#FF0000" "#FF9900" "#CCFF00"

3.3 Parallel sets diagram

3.3.1 Add nudged labels in a parallel sets diagram

The geom_parallel_sets_labs() function in this package is the same function as geom_parallel_sets_labels() from the ggforce package but with the ability to nudge labels at a fixed distance. It is especially useful when the labels are too long to fit inside the bars depicting the discrete categories. A pull request of the nudge enhancement has been submitted to its GitHub repository, ggforce, awaiting approval.

library(ggforce)
data <- as.data.frame(Titanic)
data <- gather_set_data(data, 1:4)

# Use nudge_x to offset and hjust = 0 to left-justify label
ggplot(data, aes(x, id = id, split = y, value = Freq)) +
  geom_parallel_sets(aes(fill = Sex), alpha = 0.3, axis.width = 0.1) +
  geom_parallel_sets_axes(axis.width = 0.1) +
  geom_parallel_sets_labs(colour = "red", size = 6, angle = 0,
                          nudge_x = 0.1, hjust = 0) +
  theme_bw(20)

3.4 Create a 2-variable parallel sets diagram

The plotParallel() function uses the ggforce package to produce a parallel sets diagram for visualising interaction between 2 variables. The inputs are two character vectors containing membership information.

The example below uses the Titanic dataset to show the class and age of the passengers.

data <- as.data.frame(Titanic)
plotParallel(data$Class, data$Age, labels = c("class", "age"))

We can also use plotParallel() to show cell-specific features of a single-cell dataset, such as clustering and cell type assignment.

data(sce)

plotParallel(sce$label, sce$CellType, labels = c("Cluster", "Cell Type"),
             add_counts = TRUE, text_size = 4)

Session information

sessionInfo()
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/jupyterlab/lib/libopenblasp-r0.3.20.so
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] scater_1.22.0               scuttle_1.4.0              
##  [3] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
##  [5] Biobase_2.54.0              GenomicRanges_1.46.1       
##  [7] GenomeInfoDb_1.30.1         IRanges_2.28.0             
##  [9] S4Vectors_0.32.3            BiocGenerics_0.40.0        
## [11] MatrixGenerics_1.6.0        matrixStats_0.61.0         
## [13] ggforce_0.3.3               ggplot2_3.3.5              
## [15] scRUtils_0.1.0              BiocStyle_2.22.0           
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7              httr_1.4.2               
##  [3] tools_4.1.3               bslib_0.3.1              
##  [5] utf8_1.2.2                R6_2.5.1                 
##  [7] irlba_2.3.5               vipor_0.4.5              
##  [9] DBI_1.1.2                 colorspace_2.0-3         
## [11] withr_2.5.0               gridExtra_2.3            
## [13] tidyselect_1.1.2          compiler_4.1.3           
## [15] cli_3.2.0                 BiocNeighbors_1.12.0     
## [17] enrichR_3.0               DelayedArray_0.20.0      
## [19] labeling_0.4.2            bookdown_0.26            
## [21] sass_0.4.1                scales_1.2.0             
## [23] stringr_1.4.0             digest_0.6.29            
## [25] rmarkdown_2.13            XVector_0.34.0           
## [27] pkgconfig_2.0.3           htmltools_0.5.2          
## [29] sparseMatrixStats_1.6.0   limma_3.50.1             
## [31] highr_0.9                 fastmap_1.1.0            
## [33] rlang_1.0.2               DelayedMatrixStats_1.16.0
## [35] jquerylib_0.1.4           farver_2.1.0             
## [37] generics_0.1.2            jsonlite_1.8.0           
## [39] gtools_3.9.2              BiocParallel_1.28.3      
## [41] dplyr_1.0.8               RCurl_1.98-1.6           
## [43] magrittr_2.0.3            BiocSingular_1.10.0      
## [45] GenomeInfoDbData_1.2.7    Matrix_1.4-1             
## [47] Rcpp_1.0.8.3              ggbeeswarm_0.6.0         
## [49] munsell_0.5.0             fansi_1.0.3              
## [51] viridis_0.6.2             ggnewscale_0.4.7         
## [53] lifecycle_1.0.1           edgeR_3.36.0             
## [55] stringi_1.7.6             yaml_2.3.5               
## [57] MASS_7.3-56               zlibbioc_1.40.0          
## [59] grid_4.1.3                dqrng_0.3.0              
## [61] parallel_4.1.3            ggrepel_0.9.1            
## [63] crayon_1.5.1              lattice_0.20-45          
## [65] cowplot_1.1.1             beachmat_2.10.0          
## [67] locfit_1.5-9.5            magick_2.7.3             
## [69] metapod_1.2.0             knitr_1.38               
## [71] pillar_1.7.0              igraph_1.3.0             
## [73] rjson_0.2.21              ScaledMatrix_1.2.0       
## [75] glue_1.6.2                evaluate_0.15            
## [77] scran_1.22.1              BiocManager_1.30.16      
## [79] vctrs_0.4.1               tweenr_1.0.2             
## [81] tidyr_1.2.0               gtable_0.3.0             
## [83] purrr_0.3.4               polyclip_1.10-0          
## [85] assertthat_0.2.1          xfun_0.30                
## [87] rsvd_1.0.5                viridisLite_0.4.0        
## [89] tibble_3.1.6              beeswarm_0.4.0           
## [91] cluster_2.1.3             statmod_1.4.36           
## [93] bluster_1.4.0             ellipsis_0.3.2