1 Overview

scRUtils provides various utilities for visualising and functional analysis of RNA-seq data, particularly single-cell dataset. It evolved from a collection of helper functions that were used in our in-house scRNA-seq processing workflow.

The documentation of this package is divided into 5 sections:

This vignette (#1) will introduce the package and the included demo datasets.

2 Installation

The package is current available only on GitHub.

devtools::install_github("ycl6/scRUtils")

3 Load packages

To use scRUtils in a R session, load it using the library() command.

library(scRUtils)

4 Use demo datasets

Demo datasets were created to demonstrate functions available in this package. The datasets can be loaded by using data(). Details on how the datasets were produced can be found in the following vignette: Creating demo datasets.

4.1 `cyclone` result

phases_assignments is a list object of 3, containing cyclone results performed on a simulated scRNA-seq dataset.

data(phases_assignments)
str(phases_assignments)

## List of 3
##  $ phases           : chr [1:200] "S" "S" "G1" "S" ...
##  $ scores           :'data.frame':   200 obs. of  3 variables:
##   ..$ G1 : num [1:200] 0.442 0.391 0.52 0.261 0.396 0.563 0.4 0.59 0.757 0.705 ...
##   ..$ S  : num [1:200] 0.627 0.509 0.805 0.849 0.655 0.697 0.361 0.527 0.524 0.207 ...
##   ..$ G2M: num [1:200] 0.22 0.275 0.143 0.392 0.065 0.116 0.512 0.162 0.064 0.295 ...
##  $ normalized.scores:'data.frame':   200 obs. of  3 variables:
##   ..$ G1 : num [1:200] 0.343 0.333 0.354 0.174 0.355 ...
##   ..$ S  : num [1:200] 0.486 0.433 0.548 0.565 0.587 ...
##   ..$ G2M: num [1:200] 0.1707 0.234 0.0974 0.261 0.0582 ...

table(phases_assignments[["phases"]])

## 
##  G1 G2M   S 
## 152   4  44

4.2 `findDoubletClusters` result

dbl_results is a DFrame object containingfindDoubletClusters results performed on a simulated scRNA-seq dataset.

data(dbl_results)
dbl_results

## DataFrame with 3 rows and 9 columns
##              source1     source2    num.de median.de        best     p.value
##          <character> <character> <integer> <integer> <character>   <numeric>
## cluster1    cluster3    cluster2        69        69     gene105 3.55299e-64
## cluster3    cluster2    cluster1       106       106      gene58 1.52454e-68
## cluster2    cluster3    cluster1       110       110      gene20 1.47261e-76
##          lib.size1 lib.size2      prop
##          <numeric> <numeric> <numeric>
## cluster1  0.931818   1.02652  0.164756
## cluster3  1.101626   1.07317  0.358166
## cluster2  0.907749   0.97417  0.477077

4.3 Processed simulated single-cell RNA-seq dataset

sce is a SingleCellExperiment object that has been processed with additional information stored.

data(sce)
sce

## class: SingleCellExperiment 
## dim: 1000 250 
## metadata(7): modelGeneVar HVG ... findMarkers3 findMarkers4
## assays(2): counts logcounts
## rownames(1000): YBH78 WHB35 ... OZD76 NGJ12
## rowData names(2): mean detected
## colnames(250): CELL_1 CELL_2 ... CELL_249 CELL_250
## colData names(6): label sum ... sizeFactor CellType
## reducedDimNames(3): PCA TSNE UMAP
## mainExpName: NULL
## altExpNames(0):

# rowData
rowData(sce)

## DataFrame with 1000 rows and 2 columns
##            mean  detected
##       <numeric> <numeric>
## YBH78     0.868      57.6
## WHB35     0.988      64.4
## QFY33     0.968      59.6
## IOR82     0.936      60.4
## NSS93     0.996      63.6
## ...         ...       ...
## MWV41     0.976      60.0
## LJG52     0.992      63.6
## LPY61     1.008      60.4
## OZD76     1.100      64.4
## NGJ12     1.028      64.4

# colData
colData(sce)

## DataFrame with 250 rows and 6 columns
##                label       sum  detected     total sizeFactor    CellType
##          <character> <numeric> <numeric> <numeric>  <numeric> <character>
## CELL_1             B      1063       637      1063   1.032420      Type 2
## CELL_2             A      1033       640      1033   1.003283      Type 1
## CELL_3             D      1035       621      1035   1.005225      Type 4
## CELL_4             D      1041       643      1041   1.011053      Type 4
## CELL_5             C      1015       629      1015   0.985801      Type 3
## ...              ...       ...       ...       ...        ...         ...
## CELL_246           E      1104       651      1104   1.072240      Type 5
## CELL_247           E       973       612       973   0.945009      Type 5
## CELL_248           A       946       587       946   0.918786      Type 1
## CELL_249           C      1037       617      1037   1.007168      Type 3
## CELL_250           C      1003       616      1003   0.974146      Type 3

# metadata
names(metadata(sce))

## [1] "modelGeneVar" "HVG"          "SingleR"      "findMarkers1" "findMarkers2"
## [6] "findMarkers3" "findMarkers4"

4.4 edgeR’s `TopTags` object

res_edger is a TopTags object containing differential expression analysis results of the pasilla dataset.

data(res_edger)
DataFrame(res_edger)

## DataFrame with 13441 rows and 15 columns
##             table.chromosome_name table.start_position table.end_position
##                       <character>            <integer>          <integer>
## FBgn0039155                    3R             24141395           24147490
## FBgn0025111                     X             10778954           10786907
## FBgn0003360                     X             10780893           10786958
## FBgn0039827                    3R             31196916           31203722
## FBgn0035085                    2R             24945139           24946636
## ...                           ...                  ...                ...
## FBgn0261536                    3L              7465509            7468808
## FBgn0261537                    3L              7465509            7468808
## FBgn0261538                    2R             15698726           15701696
## FBgn0261566                    2L             14587920           14588234
## FBgn0261567                    2L             14744421           14744817
##             table.ensembl_gene_id table.external_gene_name table.strand
##                       <character>              <character>    <integer>
## FBgn0039155           FBgn0039155                     Kal1            1
## FBgn0025111           FBgn0025111                     Ant2           -1
## FBgn0003360           FBgn0003360                     sesB           -1
## FBgn0039827           FBgn0039827                   CG1544            1
## FBgn0035085           FBgn0035085                   CG3770            1
## ...                           ...                      ...          ...
## FBgn0261536           FBgn0261536                  CG42660            1
## FBgn0261537           FBgn0261537                  CG42661            1
## FBgn0261538           FBgn0261538                  CG42662           -1
## FBgn0261566           FBgn0261566                  CG42680           -1
## FBgn0261567           FBgn0261567                  CG42681            1
##             table.gene_biotype table.logFC table.logCPM   table.F table.PValue
##                    <character>   <numeric>    <numeric> <numeric>    <numeric>
## FBgn0039155     protein_coding    -4.60901      6.11920  1114.997 5.49933e-242
## FBgn0025111     protein_coding     2.86251      7.15700   801.094 6.03620e-175
## FBgn0003360     protein_coding    -3.11641      8.68954   765.415 2.67121e-167
## FBgn0039827     protein_coding    -4.16723      4.62807   599.888 9.37511e-132
## FBgn0035085     protein_coding    -2.57208      5.91926   473.057 1.95797e-104
## ...                        ...         ...          ...       ...          ...
## FBgn0261536     protein_coding 4.56278e-17     -2.50256         0            1
## FBgn0261537     protein_coding 4.56278e-17     -2.50256         0            1
## FBgn0261538     protein_coding 4.56278e-17     -2.50256         0            1
## FBgn0261566     protein_coding 4.56278e-17     -2.50256         0            1
## FBgn0261567     protein_coding 4.56278e-17     -2.50256         0            1
##                table.FDR adjust.method             comparison        test
##                <numeric>   <character>            <character> <character>
## FBgn0039155 7.39165e-238            BH coldata$conditiontre..         glm
## FBgn0025111 4.05663e-171            BH coldata$conditiontre..         glm
## FBgn0003360 1.19679e-163            BH coldata$conditiontre..         glm
## FBgn0039827 3.15027e-128            BH coldata$conditiontre..         glm
## FBgn0035085 5.26341e-101            BH coldata$conditiontre..         glm
## ...                  ...           ...                    ...         ...
## FBgn0261536            1            BH coldata$conditiontre..         glm
## FBgn0261537            1            BH coldata$conditiontre..         glm
## FBgn0261538            1            BH coldata$conditiontre..         glm
## FBgn0261566            1            BH coldata$conditiontre..         glm
## FBgn0261567            1            BH coldata$conditiontre..         glm

4.5 DESeq2’s `DESeqResults` object

res_deseq2 is a DESeqResults object containing differential expression analysis results of the pasilla dataset.

data(res_deseq2)
DataFrame(res_deseq2)

## DataFrame with 13441 rows and 13 columns
##             chromosome_name start_position end_position ensembl_gene_id
##                 <character>      <integer>    <integer>     <character>
## FBgn0000003              3R        6822500      6822798     FBgn0000003
## FBgn0000008              2R       22136968     22172834     FBgn0000008
## FBgn0000014              3R       16807214     16830049     FBgn0000014
## FBgn0000015              3R       16927212     16972236     FBgn0000015
## FBgn0000017              3L       16615866     16647882     FBgn0000017
## ...                     ...            ...          ...             ...
## FBgn0261570               X       17710144     17752007     FBgn0261570
## FBgn0261571              2L       14724617     14730177     FBgn0261571
## FBgn0261573               X       19521929     19530896     FBgn0261573
## FBgn0261574              3L       20005301     20024607     FBgn0261574
## FBgn0261575              3R       25344761     25346921     FBgn0261575
##             external_gene_name    strand   gene_biotype    baseMean
##                    <character> <integer>    <character>   <numeric>
## FBgn0000003     7SLRNA:CR32864         1          ncRNA    0.171327
## FBgn0000008                  a         1 protein_coding   95.135481
## FBgn0000014              abd-A        -1 protein_coding    1.054743
## FBgn0000015              Abd-B        -1 protein_coding    0.846865
## FBgn0000017                Abl        -1 protein_coding 4352.440416
## ...                        ...       ...            ...         ...
## FBgn0261570             raskol         1 protein_coding 3207.514533
## FBgn0261571            CG42685        -1 protein_coding    0.087164
## FBgn0261573             CoRest         1 protein_coding 2240.405552
## FBgn0261574                kug         1 protein_coding 4856.287332
## FBgn0261575               tobi        -1 protein_coding   10.681333
##             log2FoldChange     lfcSE       stat    pvalue      padj
##                  <numeric> <numeric>  <numeric> <numeric> <numeric>
## FBgn0000003      0.6719550  3.871083  0.1735832 0.8621930        NA
## FBgn0000008     -0.0434937  0.221896 -0.1960092 0.8446030  0.949053
## FBgn0000014     -0.0864298  2.107004 -0.0410202 0.9672798        NA
## FBgn0000015     -1.8636397  2.259885 -0.8246614 0.4095638        NA
## FBgn0000017     -0.2591748  0.111781 -2.3186016 0.0204166  0.125406
## ...                    ...       ...        ...       ...       ...
## FBgn0261570      0.2561197  0.104286   2.455947 0.0140514 0.0946952
## FBgn0261571      0.9002107  3.859776   0.233229 0.8155838        NA
## FBgn0261573     -0.0134559  0.100680  -0.133651 0.8936789 0.9684613
## FBgn0261574      0.0690720  0.120163   0.574818 0.5654142 0.8446946
## FBgn0261575      0.5703889  0.755241   0.755241 0.4501046 0.7713092

Session information

sessionInfo()

## R version 4.1.3 (2022-03-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/jupyterlab/lib/libopenblasp-r0.3.20.so
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] scRUtils_0.1.0              DESeq2_1.34.0              
##  [3] SummarizedExperiment_1.24.0 Biobase_2.54.0             
##  [5] MatrixGenerics_1.6.0        matrixStats_0.61.0         
##  [7] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
##  [9] IRanges_2.28.0              S4Vectors_0.32.3           
## [11] BiocGenerics_0.40.0         BiocStyle_2.22.0           
## 
## loaded via a namespace (and not attached):
##   [1] ggnewscale_0.4.7            ggbeeswarm_0.6.0           
##   [3] colorspace_2.0-3            rjson_0.2.21               
##   [5] ellipsis_0.3.2              scuttle_1.4.0              
##   [7] bluster_1.4.0               XVector_0.34.0             
##   [9] BiocNeighbors_1.12.0        farver_2.1.0               
##  [11] enrichR_3.0                 ggrepel_0.9.1              
##  [13] bit64_4.0.5                 AnnotationDbi_1.56.2       
##  [15] fansi_1.0.3                 splines_4.1.3              
##  [17] sparseMatrixStats_1.6.0     cachem_1.0.6               
##  [19] geneplotter_1.72.0          knitr_1.38                 
##  [21] scater_1.22.0               polyclip_1.10-0            
##  [23] jsonlite_1.8.0              annotate_1.72.0            
##  [25] cluster_2.1.3               png_0.1-7                  
##  [27] ggforce_0.3.3               BiocManager_1.30.16        
##  [29] compiler_4.1.3              httr_1.4.2                 
##  [31] dqrng_0.3.0                 assertthat_0.2.1           
##  [33] Matrix_1.4-1                fastmap_1.1.0              
##  [35] limma_3.50.1                cli_3.2.0                  
##  [37] tweenr_1.0.2                BiocSingular_1.10.0        
##  [39] htmltools_0.5.2             tools_4.1.3                
##  [41] rsvd_1.0.5                  igraph_1.3.0               
##  [43] gtable_0.3.0                glue_1.6.2                 
##  [45] GenomeInfoDbData_1.2.7      dplyr_1.0.8                
##  [47] Rcpp_1.0.8.3                jquerylib_0.1.4            
##  [49] vctrs_0.4.1                 Biostrings_2.62.0          
##  [51] DelayedMatrixStats_1.16.0   xfun_0.30                  
##  [53] stringr_1.4.0               beachmat_2.10.0            
##  [55] lifecycle_1.0.1             irlba_2.3.5                
##  [57] statmod_1.4.36              XML_3.99-0.9               
##  [59] edgeR_3.36.0                zlibbioc_1.40.0            
##  [61] MASS_7.3-56                 scales_1.2.0               
##  [63] parallel_4.1.3              RColorBrewer_1.1-3         
##  [65] SingleCellExperiment_1.16.0 yaml_2.3.5                 
##  [67] gridExtra_2.3               memoise_2.0.1              
##  [69] ggplot2_3.3.5               sass_0.4.1                 
##  [71] stringi_1.7.6               RSQLite_2.2.10             
##  [73] genefilter_1.76.0           ScaledMatrix_1.2.0         
##  [75] scran_1.22.1                BiocParallel_1.28.3        
##  [77] rlang_1.0.2                 pkgconfig_2.0.3            
##  [79] bitops_1.0-7                evaluate_0.15              
##  [81] lattice_0.20-45             purrr_0.3.4                
##  [83] cowplot_1.1.1               bit_4.0.4                  
##  [85] tidyselect_1.1.2            magrittr_2.0.3             
##  [87] bookdown_0.26               R6_2.5.1                   
##  [89] generics_0.1.2              metapod_1.2.0              
##  [91] DelayedArray_0.20.0         DBI_1.1.2                  
##  [93] pillar_1.7.0                survival_3.3-1             
##  [95] KEGGREST_1.34.0             RCurl_1.98-1.6             
##  [97] tibble_3.1.6                crayon_1.5.1               
##  [99] utf8_1.2.2                  rmarkdown_2.13             
## [101] viridis_0.6.2               locfit_1.5-9.5             
## [103] grid_4.1.3                  blob_1.2.3                 
## [105] digest_0.6.29               xtable_1.8-4               
## [107] tidyr_1.2.0                 munsell_0.5.0              
## [109] viridisLite_0.4.0           beeswarm_0.4.0             
## [111] vipor_0.4.5                 bslib_0.3.1

Introduction to scRUtils

04/19/2022

Package

1 Overview

2 Installation

3 Load packages

4 Use demo datasets

4.1 `cyclone` result

4.2 `findDoubletClusters` result

4.3 Processed simulated single-cell RNA-seq dataset

4.4 edgeR’s `TopTags` object

4.5 DESeq2’s `DESeqResults` object

Session information

Introduction to scRUtils

04/19/2022

Package

1 Overview

2 Installation

3 Load packages

4 Use demo datasets

4.1 cyclone result

4.2 findDoubletClusters result

4.3 Processed simulated single-cell RNA-seq dataset

4.4 edgeR’s TopTags object

4.5 DESeq2’s DESeqResults object

Session information

4.1 `cyclone` result

4.2 `findDoubletClusters` result

4.4 edgeR’s `TopTags` object

4.5 DESeq2’s `DESeqResults` object