Smart-Seq Upstream analysis


We provide Shell command line for plant single cell Smart-Seq upstream analysis.

Step1. Download software and prepare input data

  Download Software:
  Software Needed:grabseqs (optional, get public data), Trim Galore, STAR, featrueCounts.
  Preparation files:
  Under the project folder, there are all fastq.gz raw data RawData folders, and the GenomeAnno folder that stores the genome and annotation files of the corresponding species.

Step2. Batch quality control and cut adapter with Trim_Golare

  Quality control and cut adapter:
  -o --output_dir : Output directory;   --stringency : Set the number of bases that can be tolerated before and after the adapter overlaps;   --phred : Phred quality score used by the sequencing platform;   --length : Set the output reads length threshold to be discarded if it is less than the set value;   --quality : Set the Phred quality score threshold;   --paired : For paired-end sequencing results.




33 64



True False
Step3. Alignment using STAR

  ERCC (optional, default: none):
  If there is a spike-in, the reference genome should add the spike-in sequence, and at the same time, the gtf file should also add the spike-in. : Click Download ERCC File
  Build index:
  --runThreadN : number of threads enabled;   --genomeDir : index output path;   --genomeFastaFiles : reference genome path;   --sjdbGTFfile : reference genome annotation file;   --sjdbOverhang : For reads of varying lengths;
  Alignment:
  --runThreadN : number of threads enabled;   --genomeDir : index path;   --outSAMtype : output file type;   --readFilesIn : Input the file path of fastq;   --sjdbOverhang : For reads of varying lengths;   --outFileNamePrefix : output file prefix.













True False
Step4. statistics countmatrix

  quantitative analysis:
  -a : the name of the annotation file, supports Gzipped file format;   -o : group features into meta-feature attribute types;   -g : ghe name of the output file;   -t : specifies the feature type;   -T : number of threads enabled.







Example:
nohup bash smartseq.sh &
Help instructions:
After running the Shell script, the project folder contains the following content:
RawData folder: stored raw data; GenomeAnno folder: genome files and annotation files; CleanData folder: data after joint removal and quality control; index_dir folder: index file created by STAR; Alignment_result folder: the bam files sorted by STAR comparison; FeatureCounts_Result.txt: FeatureCounts quantitative results