scRUtils 0.1.0
scRUtils
provides various utilities for visualising and functional analysis of RNA-seq data,
particularly single-cell dataset. It evolved from a collection of helper functions that were
used in our in-house scRNA-seq processing workflow.
The documentation of this package is divided into 5 sections:
This vignette (#1) will introduce the package and the included demo datasets.
The package is current available only on GitHub.
devtools::install_github("ycl6/scRUtils")
To use scRUtils
in a R session, load it using the library()
command.
library(scRUtils)
Demo datasets were created to demonstrate functions available in this package. The datasets can
be loaded by using data()
. Details on how the datasets were produced can be found in the
following vignette: Creating demo datasets.
cyclone
resultphases_assignments
is a list
object of 3, containing cyclone
results performed on a
simulated scRNA-seq dataset.
data(phases_assignments)
str(phases_assignments)
## List of 3
## $ phases : chr [1:200] "S" "S" "G1" "S" ...
## $ scores :'data.frame': 200 obs. of 3 variables:
## ..$ G1 : num [1:200] 0.442 0.391 0.52 0.261 0.396 0.563 0.4 0.59 0.757 0.705 ...
## ..$ S : num [1:200] 0.627 0.509 0.805 0.849 0.655 0.697 0.361 0.527 0.524 0.207 ...
## ..$ G2M: num [1:200] 0.22 0.275 0.143 0.392 0.065 0.116 0.512 0.162 0.064 0.295 ...
## $ normalized.scores:'data.frame': 200 obs. of 3 variables:
## ..$ G1 : num [1:200] 0.343 0.333 0.354 0.174 0.355 ...
## ..$ S : num [1:200] 0.486 0.433 0.548 0.565 0.587 ...
## ..$ G2M: num [1:200] 0.1707 0.234 0.0974 0.261 0.0582 ...
table(phases_assignments[["phases"]])
##
## G1 G2M S
## 152 4 44
findDoubletClusters
resultdbl_results
is a DFrame
object containingfindDoubletClusters
results performed on a
simulated scRNA-seq dataset.
data(dbl_results)
dbl_results
## DataFrame with 3 rows and 9 columns
## source1 source2 num.de median.de best p.value
## <character> <character> <integer> <integer> <character> <numeric>
## cluster1 cluster3 cluster2 69 69 gene105 3.55299e-64
## cluster3 cluster2 cluster1 106 106 gene58 1.52454e-68
## cluster2 cluster3 cluster1 110 110 gene20 1.47261e-76
## lib.size1 lib.size2 prop
## <numeric> <numeric> <numeric>
## cluster1 0.931818 1.02652 0.164756
## cluster3 1.101626 1.07317 0.358166
## cluster2 0.907749 0.97417 0.477077
sce
is a SingleCellExperiment
object that has been processed with additional information
stored.
data(sce)
sce
## class: SingleCellExperiment
## dim: 1000 250
## metadata(7): modelGeneVar HVG ... findMarkers3 findMarkers4
## assays(2): counts logcounts
## rownames(1000): YBH78 WHB35 ... OZD76 NGJ12
## rowData names(2): mean detected
## colnames(250): CELL_1 CELL_2 ... CELL_249 CELL_250
## colData names(6): label sum ... sizeFactor CellType
## reducedDimNames(3): PCA TSNE UMAP
## mainExpName: NULL
## altExpNames(0):
# rowData
rowData(sce)
## DataFrame with 1000 rows and 2 columns
## mean detected
## <numeric> <numeric>
## YBH78 0.868 57.6
## WHB35 0.988 64.4
## QFY33 0.968 59.6
## IOR82 0.936 60.4
## NSS93 0.996 63.6
## ... ... ...
## MWV41 0.976 60.0
## LJG52 0.992 63.6
## LPY61 1.008 60.4
## OZD76 1.100 64.4
## NGJ12 1.028 64.4
# colData
colData(sce)
## DataFrame with 250 rows and 6 columns
## label sum detected total sizeFactor CellType
## <character> <numeric> <numeric> <numeric> <numeric> <character>
## CELL_1 B 1063 637 1063 1.032420 Type 2
## CELL_2 A 1033 640 1033 1.003283 Type 1
## CELL_3 D 1035 621 1035 1.005225 Type 4
## CELL_4 D 1041 643 1041 1.011053 Type 4
## CELL_5 C 1015 629 1015 0.985801 Type 3
## ... ... ... ... ... ... ...
## CELL_246 E 1104 651 1104 1.072240 Type 5
## CELL_247 E 973 612 973 0.945009 Type 5
## CELL_248 A 946 587 946 0.918786 Type 1
## CELL_249 C 1037 617 1037 1.007168 Type 3
## CELL_250 C 1003 616 1003 0.974146 Type 3
# metadata
names(metadata(sce))
## [1] "modelGeneVar" "HVG" "SingleR" "findMarkers1" "findMarkers2"
## [6] "findMarkers3" "findMarkers4"
DESeqResults
objectres_deseq2
is a DESeqResults
object containing differential expression analysis results of
the pasilla
dataset.
data(res_deseq2)
DataFrame(res_deseq2)
## DataFrame with 13441 rows and 13 columns
## chromosome_name start_position end_position ensembl_gene_id
## <character> <integer> <integer> <character>
## FBgn0000003 3R 6822500 6822798 FBgn0000003
## FBgn0000008 2R 22136968 22172834 FBgn0000008
## FBgn0000014 3R 16807214 16830049 FBgn0000014
## FBgn0000015 3R 16927212 16972236 FBgn0000015
## FBgn0000017 3L 16615866 16647882 FBgn0000017
## ... ... ... ... ...
## FBgn0261570 X 17710144 17752007 FBgn0261570
## FBgn0261571 2L 14724617 14730177 FBgn0261571
## FBgn0261573 X 19521929 19530896 FBgn0261573
## FBgn0261574 3L 20005301 20024607 FBgn0261574
## FBgn0261575 3R 25344761 25346921 FBgn0261575
## external_gene_name strand gene_biotype baseMean
## <character> <integer> <character> <numeric>
## FBgn0000003 7SLRNA:CR32864 1 ncRNA 0.171327
## FBgn0000008 a 1 protein_coding 95.135481
## FBgn0000014 abd-A -1 protein_coding 1.054743
## FBgn0000015 Abd-B -1 protein_coding 0.846865
## FBgn0000017 Abl -1 protein_coding 4352.440416
## ... ... ... ... ...
## FBgn0261570 raskol 1 protein_coding 3207.514533
## FBgn0261571 CG42685 -1 protein_coding 0.087164
## FBgn0261573 CoRest 1 protein_coding 2240.405552
## FBgn0261574 kug 1 protein_coding 4856.287332
## FBgn0261575 tobi -1 protein_coding 10.681333
## log2FoldChange lfcSE stat pvalue padj
## <numeric> <numeric> <numeric> <numeric> <numeric>
## FBgn0000003 0.6719550 3.871083 0.1735832 0.8621930 NA
## FBgn0000008 -0.0434937 0.221896 -0.1960092 0.8446030 0.949053
## FBgn0000014 -0.0864298 2.107004 -0.0410202 0.9672798 NA
## FBgn0000015 -1.8636397 2.259885 -0.8246614 0.4095638 NA
## FBgn0000017 -0.2591748 0.111781 -2.3186016 0.0204166 0.125406
## ... ... ... ... ... ...
## FBgn0261570 0.2561197 0.104286 2.455947 0.0140514 0.0946952
## FBgn0261571 0.9002107 3.859776 0.233229 0.8155838 NA
## FBgn0261573 -0.0134559 0.100680 -0.133651 0.8936789 0.9684613
## FBgn0261574 0.0690720 0.120163 0.574818 0.5654142 0.8446946
## FBgn0261575 0.5703889 0.755241 0.755241 0.4501046 0.7713092
sessionInfo()
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/jupyterlab/lib/libopenblasp-r0.3.20.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scRUtils_0.1.0 DESeq2_1.34.0
## [3] SummarizedExperiment_1.24.0 Biobase_2.54.0
## [5] MatrixGenerics_1.6.0 matrixStats_0.61.0
## [7] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
## [9] IRanges_2.28.0 S4Vectors_0.32.3
## [11] BiocGenerics_0.40.0 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] ggnewscale_0.4.7 ggbeeswarm_0.6.0
## [3] colorspace_2.0-3 rjson_0.2.21
## [5] ellipsis_0.3.2 scuttle_1.4.0
## [7] bluster_1.4.0 XVector_0.34.0
## [9] BiocNeighbors_1.12.0 farver_2.1.0
## [11] enrichR_3.0 ggrepel_0.9.1
## [13] bit64_4.0.5 AnnotationDbi_1.56.2
## [15] fansi_1.0.3 splines_4.1.3
## [17] sparseMatrixStats_1.6.0 cachem_1.0.6
## [19] geneplotter_1.72.0 knitr_1.38
## [21] scater_1.22.0 polyclip_1.10-0
## [23] jsonlite_1.8.0 annotate_1.72.0
## [25] cluster_2.1.3 png_0.1-7
## [27] ggforce_0.3.3 BiocManager_1.30.16
## [29] compiler_4.1.3 httr_1.4.2
## [31] dqrng_0.3.0 assertthat_0.2.1
## [33] Matrix_1.4-1 fastmap_1.1.0
## [35] limma_3.50.1 cli_3.2.0
## [37] tweenr_1.0.2 BiocSingular_1.10.0
## [39] htmltools_0.5.2 tools_4.1.3
## [41] rsvd_1.0.5 igraph_1.3.0
## [43] gtable_0.3.0 glue_1.6.2
## [45] GenomeInfoDbData_1.2.7 dplyr_1.0.8
## [47] Rcpp_1.0.8.3 jquerylib_0.1.4
## [49] vctrs_0.4.1 Biostrings_2.62.0
## [51] DelayedMatrixStats_1.16.0 xfun_0.30
## [53] stringr_1.4.0 beachmat_2.10.0
## [55] lifecycle_1.0.1 irlba_2.3.5
## [57] statmod_1.4.36 XML_3.99-0.9
## [59] edgeR_3.36.0 zlibbioc_1.40.0
## [61] MASS_7.3-56 scales_1.2.0
## [63] parallel_4.1.3 RColorBrewer_1.1-3
## [65] SingleCellExperiment_1.16.0 yaml_2.3.5
## [67] gridExtra_2.3 memoise_2.0.1
## [69] ggplot2_3.3.5 sass_0.4.1
## [71] stringi_1.7.6 RSQLite_2.2.10
## [73] genefilter_1.76.0 ScaledMatrix_1.2.0
## [75] scran_1.22.1 BiocParallel_1.28.3
## [77] rlang_1.0.2 pkgconfig_2.0.3
## [79] bitops_1.0-7 evaluate_0.15
## [81] lattice_0.20-45 purrr_0.3.4
## [83] cowplot_1.1.1 bit_4.0.4
## [85] tidyselect_1.1.2 magrittr_2.0.3
## [87] bookdown_0.26 R6_2.5.1
## [89] generics_0.1.2 metapod_1.2.0
## [91] DelayedArray_0.20.0 DBI_1.1.2
## [93] pillar_1.7.0 survival_3.3-1
## [95] KEGGREST_1.34.0 RCurl_1.98-1.6
## [97] tibble_3.1.6 crayon_1.5.1
## [99] utf8_1.2.2 rmarkdown_2.13
## [101] viridis_0.6.2 locfit_1.5-9.5
## [103] grid_4.1.3 blob_1.2.3
## [105] digest_0.6.29 xtable_1.8-4
## [107] tidyr_1.2.0 munsell_0.5.0
## [109] viridisLite_0.4.0 beeswarm_0.4.0
## [111] vipor_0.4.5 bslib_0.3.1