Title: | Imputing Dropout Events in Single-Cell RNA-Sequencing Data |
---|---|
Description: | R codes for imputing dropout events. Many statistical methods in cell type identification, visualization and lineage reconstruction do not account for dropout events ('PCAreduce', 'SC3', 'PCA', 't-SNE', 'Monocle', 'TSCAN', etc). 'DrImpute' can improve the performance of such software by imputing dropout events. |
Authors: | Il-Youp Kwak with contributions from Wuming Gong |
Maintainer: | Il-Youp Kwak <[email protected]> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2024-11-07 04:42:48 UTC |
Source: | https://github.com/ikwak2/drimpute |
Imputing dropout events in single-cell RNA-sequencing data.
DrImpute(X, ks = 10:15, dists = c("spearman", "pearson"), method = "mean", cls = NULL, seed = 1, zerop = 0)
DrImpute(X, ks = 10:15, dists = c("spearman", "pearson"), method = "mean", cls = NULL, seed = 1, zerop = 0)
X |
Gene expression matrix (gene by cell). |
ks |
Number of cell clustering groups. Default set to ks = 10:15. |
dists |
Distribution matrices to use. Default is set to c("spearman", "pearson"). "eucleadian" can be added as well. |
method |
Use "mean" for mean imputation, "med" for median imputation. |
cls |
User can manually provide clustering information. Using different base clusterings. each row represent different clusterings. each column represent each cell. |
seed |
User can provide a seed. |
zerop |
zero percentage of resulting imputation is at least zerop. |
Imputed Gene expression matrix (gene by cell).
Il-Youp Kwak
Il-Youp Kwak, Wuming Gong, Kaoko Koyano-Nakagawa and Daniel J. Garry (2017+) DrImpute: Imputing dropout eveents in single cell RNA sequencing data
data(exdata) exdata <- preprocessSC(exdata) exdata <- exdata[1:3000, 1:80] logdat <- log(exdata+1) cls <- getCls(logdat) logdat_imp <- DrImpute(logdat, cls = cls)
data(exdata) exdata <- preprocessSC(exdata) exdata <- exdata[1:3000, 1:80] logdat <- log(exdata+1) cls <- getCls(logdat) logdat_imp <- DrImpute(logdat, cls = cls)
This data set is subset from Usoskin et al. (2015). Original data is RNA-seq data on 799 cells dissected from the mouse lumbar dorsal root ganglion distributed over a total of nine 96-well plates. We randomly selected 150 cells from the data.
Column names indicate four different cell types, NF, NP, TH, and PEP.
data(exdata)
data(exdata)
Usoskin D et al. Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nature Neuroscience. Nature Research,2015;18:145-53.
data(exdata) exdata <- preprocessSC(exdata)
data(exdata) exdata <- preprocessSC(exdata)
Similarity matrix constructed using "pearson", "spearman" or "euclidean". K-means clustering is performed on first few number of principal components of similarity matrix.
getCls(X, ks = 10:15, dists = c("spearman", "pearson"), dim.reduc.prop = 0.05)
getCls(X, ks = 10:15, dists = c("spearman", "pearson"), dim.reduc.prop = 0.05)
X |
Log transformed gene expression matrix (Gene by Cell). |
ks |
Number of cell clustering groups. Default set to ks = 10:15. |
dists |
Distribution matrices to use. Default is set to c("spearman", "pearson"). "euclidean" can be added as well. |
dim.reduc.prop |
Proportion of principal components to use for K-means clustering. |
A matrix object, Each row represent different clustering results.
Il-Youp Kwak
Il-Youp Kwak, Wuming Gong, Kaoko Koyano-Nakagawa and Daniel J. Garry (2017+) DrImpute: Imputing dropout eveents in single cell RNA sequencing data
data(exdata) exdata <- preprocessSC(exdata) exdata <- exdata[1:3000, 1:80] logdat <- log(exdata+1) cls <- getCls(logdat)
data(exdata) exdata <- preprocessSC(exdata) exdata <- exdata[1:3000, 1:80] logdat <- log(exdata+1) cls <- getCls(logdat)
Preprocess gene expression data
preprocessSC(X, min.expressed.gene = 0, min.expressed.cell = 2, max.expressed.ratio = 1, normalize.by.size.effect = FALSE)
preprocessSC(X, min.expressed.gene = 0, min.expressed.cell = 2, max.expressed.ratio = 1, normalize.by.size.effect = FALSE)
X |
Gene expression matrix (Gene by Cell). |
min.expressed.gene |
Cell level filtering criteria. For a given cell, if the number of expressed genes are less than min.expressed.gene, we filter it out. |
min.expressed.cell |
Gene level filtering criteria. For a given gene, if the number of expressed cells are less than min.expressed.cell, we filter it out. |
max.expressed.ratio |
Gene level filtering criteria. For a given gene, if the ratio of expressed cells are larger than max.expressed.ratio, we filter it out. |
normalize.by.size.effect |
Normaize using size factor. |
Filtered gene expression matrix
Wuming Gong
Il-Youp Kwak, Wuming Gong, Kaoko Koyano-Nakagawa and Daniel J. Garry (2017+) DrImpute: Imputing dropout eveents in single cell RNA sequencing data
data(exdata) exdata <- preprocessSC(exdata)
data(exdata) exdata <- preprocessSC(exdata)