GreenCells | Smart-Seq Downstream analysis

Smart-Seq Downstream analysis

We provide Rscript command line for plant single cell Smart-Seq downstream analysis.

Step1. Load the R package and prepare input data

  Preparation files:
  Including the expression matrix file obtained by Smart-Seq and the csv file of MarkerGene.
  R packages:
  Use install.packages to download Seurat, ggplot2, patchwork, dplyr, DT, ggraph, getopt.

Step2. Data loading and quality control

  Load Data:
  Initialize Seurat object with raw data.
  Min Cells Num : Minimum number of cells expressing each gene; Min Genes Num : Minimum number of genes detected per cell.
  Get Mitochondrial Genes:
  Mitochondrial Gene Name : Filter mitochondrial genes based on regular expressions.
  Quality Control Filtration:
  nFeature RNA Minimal Limit : Lower limit of genes detected per cell;   nFeature RNA Maximal Limit : Upper limit of genes detected per cell; percent.mt Limit : Percentage limit of mitochondrial genes.

Min Cells Num :

Min Genes Num :

Mitochondrial Gene Name :

nFeature RNA Minimal limit :

nFeature RNA Maximal limit :

percent.mt limit :

Step3. Dimensionality reduction clustering

  Highly Variable Genes:
  scale.factor : Sets the scale factor for cell-level normalization;   Highly Variable Gene Number : Number of features to select as top variable features; Top Number : Top number of highly variable gene.
  Clustering:
  npcs : Total Number of PCs to compute and store;   Dims : Dimensions of reduction to use as input; Resolution : Sharpness, positively correlated with the number of clusters.

scale.factor :

Highly Variable Gene Number :

Top Number :

npcs :

Dims :

Resolution :

Step4. Find markers

min.pct : Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations; logfc.threshold : Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells; Top Number : Top number of Marker gene.

min.pct :

logfc.threshold :

Top Number :

Step5. Cell type annotation

assignded_percent : The minimum percentage of cells assigned within a cluster; clusters with assignments below this percentage will be marked as unassigned.

assignded_percent :

Example:
Bash: Rscript analysis.r -f Rawdata_file -m markergene.csv
The "Rawdata_file" is where you'll find the raw data files, and "markergene.csv" is the marker gene table. This table needs two columns: the first labeled "gene" and the second labeled "type", separated by commas. Click here (RawdataFile, markergene.csv) to download the example files, unzip it, try using these files to test our program.
Rscript rcsript.R -f RawdataFile -m markergene.csv
Help instructions:
After running the R script, the output results include the following:
VlnPlot.png: The violin plot visualizes the QC metrics; FeatureScatter.png: Visualize the relationships between features in the object metadata, and the numbers above the graph indicate its Pearson correlation coefficient; HighlyVariableGenes.png: The abscissa is the mean, the ordinate is the variance, the dots represent genes, and the red ones are highly variable genes.; Cluster_umap.png: The UMAP clustering result graph; MarkerGeneFeaturePlot_1.png & KnownMarkerDotPlot_2.png: The feature plot and dot plot of known marker genes; PredictedMarkerGene.csv: All of the predicted marker genes; Top5_markerDotPlot.png: Dot plot for the top 5 predicted marker genes of each cell cluster; SubgroupMarkerGenesExpression.png: Heatmap for the top 5 predicted marker genes of each cell cluster; Annotation.png: This graph shows the clustering and annotation information using the UMAP algorithm; Cell_type.csv: Cell types corresponding to each cluster; basic.Rdata: Rscript run result file