Starting from the bulk RNA-seq data matrix consisting of three types of cells, T1 cells, T2 cells, and T3 cells, the data matrix X 1 is obtained by resampling of raw data from the different type cells separately. a Schematic overview of the simulation strategy. Performance evaluation using down-sampled bulk RNA-seq data. We confirmed that our simulated data also contains this property using the mean-variance plot (Additional file 1: Figures S1 and S3). It is well known that real RNA-seq data tend to have a characteristic property of inverse relationship between mean and variance. Moreover, to evaluate the robustness of imputation methods, at a given dropout rate, we simulated 100 data sets. Using the 2 strategies, we simulated data with dropout rates corresponding to 60 to 87% zeros in the data. Here, we introduced dropout events using an exponential function to control dropout rate (parameter λ) and a Bernoulli process to introduce dropout events at the corresponding dropout rate (see the “ Methods” section). Strategy 2 uses down-sampled real bulk RNA-seq dataset (Fig. Splatter captures many features observed in the scRNA-seq data, including zero-inflation, gene-wise dispersion, and differing sequencing depths between cells. Strategy 1 is based on the Splatter method and generates completely synthetic data (Fig. We first evaluated the performance of SCRABBLE using simulated data where the ground truth is known. It also does not force the imputation of genes that are not affected by dropout events. SCRABBLE is based on the framework of matrix regularization that does not impose an assumption of specific statistical distributions for gene expression levels and dropout probabilities. Bulk RNA-seq data allows SCRABBLE to achieve a more accurate estimate of the gene expression distributions across cells than using single-cell data alone. And it is becoming increasingly common to collect matched bulk data when a new scRNA-seq experiment is performed. For many scRNA-seq data, there are usually existing bulk data on the same cell/tissue. The bulk data represent the unfractionated composite mixture of all cell types without sorting them into individual types. SCRABBLE only requires consistent cell population between single-cell and bulk data. Here, we describe the SCRABBLE algorithm for imputing scRNA-seq data by using bulk RNA-seq as a constraint. Īll imputation methods above recover dropout values using scRNA-seq only. VIPER uses a non-negative sparse regression model to progressively infer local neighborhood cells for imputation. DrImpute first conducts consensus clustering of cells followed by imputation by the average value of similar cells. But, it differs from the scImpute by using a Bayesian model to compute the probability of dropout events. Similarly, SAVER also uses a linear regression to impute the missing data. It then uses a LASSO model to impute dropout values. scImpute first computes dropout probability using a two-component mixture model. Among these methods, MAGIC imputes dropout events by data diffusion based on a Markov transition matrix that defines a kernel distance measure among cells. The second approach is direct imputation of scRNA-seq data. used a pool-and-deconvolute approach to deal with dropout events for accurate normalization of scRNA-seq data. For instance, ZINB-WaVE generates weights for genes and cells using a zero-inflated negative binomial model which in turn is used to detect differential expression. One approach adopts analysis strategies that take dropout into consideration. To address this critical challenge, two types of approaches have been developed. Such dropout events lead to bias in downstream analysis, such as clustering, classification, differential expression analysis, and pseudo-time analysis. However, a major limitation of scRNA-seq data is the low capturing and sequencing efficiency affecting each cell, resulting in a large proportion of expressed genes with zeros or low read counts, which is known as the “dropout” phenomenon. Single-cell RNA sequencing (scRNA-seq) has revolutionized cell biology, enabling studies of heterogeneity and transcriptome dynamics of complex tissues at single-cell resolution.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |