scRUtils 0.1.0
scRUtils provides various utilities for visualising and functional analysis of RNA-seq data,
particularly single-cell dataset. It evolved from a collection of helper functions that were
used in our in-house scRNA-seq processing workflow.
The documentation of this package is divided into 5 sections:
This vignette (#1) will introduce the package and the included demo datasets.
The package is current available only on GitHub.
devtools::install_github("ycl6/scRUtils")
To use scRUtils in a R session, load it using the library() command.
library(scRUtils)
Demo datasets were created to demonstrate functions available in this package. The datasets can
be loaded by using data(). Details on how the datasets were produced can be found in the
following vignette: Creating demo datasets.
cyclone resultphases_assignments is a list object of 3, containing cyclone results performed on a
simulated scRNA-seq dataset.
data(phases_assignments)
str(phases_assignments)
## List of 3
## $ phases : chr [1:200] "S" "S" "G1" "S" ...
## $ scores :'data.frame': 200 obs. of 3 variables:
## ..$ G1 : num [1:200] 0.442 0.391 0.52 0.261 0.396 0.563 0.4 0.59 0.757 0.705 ...
## ..$ S : num [1:200] 0.627 0.509 0.805 0.849 0.655 0.697 0.361 0.527 0.524 0.207 ...
## ..$ G2M: num [1:200] 0.22 0.275 0.143 0.392 0.065 0.116 0.512 0.162 0.064 0.295 ...
## $ normalized.scores:'data.frame': 200 obs. of 3 variables:
## ..$ G1 : num [1:200] 0.343 0.333 0.354 0.174 0.355 ...
## ..$ S : num [1:200] 0.486 0.433 0.548 0.565 0.587 ...
## ..$ G2M: num [1:200] 0.1707 0.234 0.0974 0.261 0.0582 ...
table(phases_assignments[["phases"]])
##
## G1 G2M S
## 152 4 44
findDoubletClusters resultdbl_results is a DFrame object containingfindDoubletClusters results performed on a
simulated scRNA-seq dataset.
data(dbl_results)
dbl_results
## DataFrame with 3 rows and 9 columns
## source1 source2 num.de median.de best p.value
## <character> <character> <integer> <integer> <character> <numeric>
## cluster1 cluster3 cluster2 69 69 gene105 3.55299e-64
## cluster3 cluster2 cluster1 106 106 gene58 1.52454e-68
## cluster2 cluster3 cluster1 110 110 gene20 1.47261e-76
## lib.size1 lib.size2 prop
## <numeric> <numeric> <numeric>
## cluster1 0.931818 1.02652 0.164756
## cluster3 1.101626 1.07317 0.358166
## cluster2 0.907749 0.97417 0.477077
sce is a SingleCellExperiment object that has been processed with additional information
stored.
data(sce)
sce
## class: SingleCellExperiment
## dim: 1000 250
## metadata(7): modelGeneVar HVG ... findMarkers3 findMarkers4
## assays(2): counts logcounts
## rownames(1000): YBH78 WHB35 ... OZD76 NGJ12
## rowData names(2): mean detected
## colnames(250): CELL_1 CELL_2 ... CELL_249 CELL_250
## colData names(6): label sum ... sizeFactor CellType
## reducedDimNames(3): PCA TSNE UMAP
## mainExpName: NULL
## altExpNames(0):
# rowData
rowData(sce)
## DataFrame with 1000 rows and 2 columns
## mean detected
## <numeric> <numeric>
## YBH78 0.868 57.6
## WHB35 0.988 64.4
## QFY33 0.968 59.6
## IOR82 0.936 60.4
## NSS93 0.996 63.6
## ... ... ...
## MWV41 0.976 60.0
## LJG52 0.992 63.6
## LPY61 1.008 60.4
## OZD76 1.100 64.4
## NGJ12 1.028 64.4
# colData
colData(sce)
## DataFrame with 250 rows and 6 columns
## label sum detected total sizeFactor CellType
## <character> <numeric> <numeric> <numeric> <numeric> <character>
## CELL_1 B 1063 637 1063 1.032420 Type 2
## CELL_2 A 1033 640 1033 1.003283 Type 1
## CELL_3 D 1035 621 1035 1.005225 Type 4
## CELL_4 D 1041 643 1041 1.011053 Type 4
## CELL_5 C 1015 629 1015 0.985801 Type 3
## ... ... ... ... ... ... ...
## CELL_246 E 1104 651 1104 1.072240 Type 5
## CELL_247 E 973 612 973 0.945009 Type 5
## CELL_248 A 946 587 946 0.918786 Type 1
## CELL_249 C 1037 617 1037 1.007168 Type 3
## CELL_250 C 1003 616 1003 0.974146 Type 3
# metadata
names(metadata(sce))
## [1] "modelGeneVar" "HVG" "SingleR" "findMarkers1" "findMarkers2"
## [6] "findMarkers3" "findMarkers4"
DESeqResults objectres_deseq2 is a DESeqResults object containing differential expression analysis results of
the pasilla dataset.
data(res_deseq2)
DataFrame(res_deseq2)
## DataFrame with 13441 rows and 13 columns
## chromosome_name start_position end_position ensembl_gene_id
## <character> <integer> <integer> <character>
## FBgn0000003 3R 6822500 6822798 FBgn0000003
## FBgn0000008 2R 22136968 22172834 FBgn0000008
## FBgn0000014 3R 16807214 16830049 FBgn0000014
## FBgn0000015 3R 16927212 16972236 FBgn0000015
## FBgn0000017 3L 16615866 16647882 FBgn0000017
## ... ... ... ... ...
## FBgn0261570 X 17710144 17752007 FBgn0261570
## FBgn0261571 2L 14724617 14730177 FBgn0261571
## FBgn0261573 X 19521929 19530896 FBgn0261573
## FBgn0261574 3L 20005301 20024607 FBgn0261574
## FBgn0261575 3R 25344761 25346921 FBgn0261575
## external_gene_name strand gene_biotype baseMean
## <character> <integer> <character> <numeric>
## FBgn0000003 7SLRNA:CR32864 1 ncRNA 0.171327
## FBgn0000008 a 1 protein_coding 95.135481
## FBgn0000014 abd-A -1 protein_coding 1.054743
## FBgn0000015 Abd-B -1 protein_coding 0.846865
## FBgn0000017 Abl -1 protein_coding 4352.440416
## ... ... ... ... ...
## FBgn0261570 raskol 1 protein_coding 3207.514533
## FBgn0261571 CG42685 -1 protein_coding 0.087164
## FBgn0261573 CoRest 1 protein_coding 2240.405552
## FBgn0261574 kug 1 protein_coding 4856.287332
## FBgn0261575 tobi -1 protein_coding 10.681333
## log2FoldChange lfcSE stat pvalue padj
## <numeric> <numeric> <numeric> <numeric> <numeric>
## FBgn0000003 0.6719550 3.871083 0.1735832 0.8621930 NA
## FBgn0000008 -0.0434937 0.221896 -0.1960092 0.8446030 0.949053
## FBgn0000014 -0.0864298 2.107004 -0.0410202 0.9672798 NA
## FBgn0000015 -1.8636397 2.259885 -0.8246614 0.4095638 NA
## FBgn0000017 -0.2591748 0.111781 -2.3186016 0.0204166 0.125406
## ... ... ... ... ... ...
## FBgn0261570 0.2561197 0.104286 2.455947 0.0140514 0.0946952
## FBgn0261571 0.9002107 3.859776 0.233229 0.8155838 NA
## FBgn0261573 -0.0134559 0.100680 -0.133651 0.8936789 0.9684613
## FBgn0261574 0.0690720 0.120163 0.574818 0.5654142 0.8446946
## FBgn0261575 0.5703889 0.755241 0.755241 0.4501046 0.7713092
sessionInfo()
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/jupyterlab/lib/libopenblasp-r0.3.20.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scRUtils_0.1.0 DESeq2_1.34.0
## [3] SummarizedExperiment_1.24.0 Biobase_2.54.0
## [5] MatrixGenerics_1.6.0 matrixStats_0.61.0
## [7] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
## [9] IRanges_2.28.0 S4Vectors_0.32.3
## [11] BiocGenerics_0.40.0 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] ggnewscale_0.4.7 ggbeeswarm_0.6.0
## [3] colorspace_2.0-3 rjson_0.2.21
## [5] ellipsis_0.3.2 scuttle_1.4.0
## [7] bluster_1.4.0 XVector_0.34.0
## [9] BiocNeighbors_1.12.0 farver_2.1.0
## [11] enrichR_3.0 ggrepel_0.9.1
## [13] bit64_4.0.5 AnnotationDbi_1.56.2
## [15] fansi_1.0.3 splines_4.1.3
## [17] sparseMatrixStats_1.6.0 cachem_1.0.6
## [19] geneplotter_1.72.0 knitr_1.38
## [21] scater_1.22.0 polyclip_1.10-0
## [23] jsonlite_1.8.0 annotate_1.72.0
## [25] cluster_2.1.3 png_0.1-7
## [27] ggforce_0.3.3 BiocManager_1.30.16
## [29] compiler_4.1.3 httr_1.4.2
## [31] dqrng_0.3.0 assertthat_0.2.1
## [33] Matrix_1.4-1 fastmap_1.1.0
## [35] limma_3.50.1 cli_3.2.0
## [37] tweenr_1.0.2 BiocSingular_1.10.0
## [39] htmltools_0.5.2 tools_4.1.3
## [41] rsvd_1.0.5 igraph_1.3.0
## [43] gtable_0.3.0 glue_1.6.2
## [45] GenomeInfoDbData_1.2.7 dplyr_1.0.8
## [47] Rcpp_1.0.8.3 jquerylib_0.1.4
## [49] vctrs_0.4.1 Biostrings_2.62.0
## [51] DelayedMatrixStats_1.16.0 xfun_0.30
## [53] stringr_1.4.0 beachmat_2.10.0
## [55] lifecycle_1.0.1 irlba_2.3.5
## [57] statmod_1.4.36 XML_3.99-0.9
## [59] edgeR_3.36.0 zlibbioc_1.40.0
## [61] MASS_7.3-56 scales_1.2.0
## [63] parallel_4.1.3 RColorBrewer_1.1-3
## [65] SingleCellExperiment_1.16.0 yaml_2.3.5
## [67] gridExtra_2.3 memoise_2.0.1
## [69] ggplot2_3.3.5 sass_0.4.1
## [71] stringi_1.7.6 RSQLite_2.2.10
## [73] genefilter_1.76.0 ScaledMatrix_1.2.0
## [75] scran_1.22.1 BiocParallel_1.28.3
## [77] rlang_1.0.2 pkgconfig_2.0.3
## [79] bitops_1.0-7 evaluate_0.15
## [81] lattice_0.20-45 purrr_0.3.4
## [83] cowplot_1.1.1 bit_4.0.4
## [85] tidyselect_1.1.2 magrittr_2.0.3
## [87] bookdown_0.26 R6_2.5.1
## [89] generics_0.1.2 metapod_1.2.0
## [91] DelayedArray_0.20.0 DBI_1.1.2
## [93] pillar_1.7.0 survival_3.3-1
## [95] KEGGREST_1.34.0 RCurl_1.98-1.6
## [97] tibble_3.1.6 crayon_1.5.1
## [99] utf8_1.2.2 rmarkdown_2.13
## [101] viridis_0.6.2 locfit_1.5-9.5
## [103] grid_4.1.3 blob_1.2.3
## [105] digest_0.6.29 xtable_1.8-4
## [107] tidyr_1.2.0 munsell_0.5.0
## [109] viridisLite_0.4.0 beeswarm_0.4.0
## [111] vipor_0.4.5 bslib_0.3.1