Single-cell Transcriptome and Regulome Analysis Pipeline
MAESTRO(Model-based AnalysEs of Single-cell Transcriptome and RegulOme) is a comprehensive single-cell RNA-seq and ATAC-seq analysis suit built using snakemake. MAESTRO combines several dozen tools and packages to create an integrative pipeline, which enables scRNA-seq and scATAC-seq analysis from raw sequencing data (fastq files) all the way through alignment, quality control, cell filtering, normalization, unsupervised clustering, differential expression and peak calling, celltype annotation and transcription regulation analysis. Currently, MAESTRO support Smart-seq2, 10x-genomics, Drop-seq, SPLiT-seq for scRNA-seq protocols; microfudics-based, 10x-genomics and sci-ATAC-seq for scATAC-seq protocols.
RP-basedand
peak-based).
There are two ways to install MAESTRO -- to install the full workflow through Anaconda cloud; or to install only the R codes for exploring the processed data.
MAESTRO uses the Miniconda3 package management system to harmonize all of the software packages. Users can install the full solution of MAESTRO using the conda environment.
Use the following commands to install Minicoda3:
bash $ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh $ bash Miniconda3-latest-Linux-x86_64.shAnd then users can create an isolated environment for MAESTRO and install through the following commands: ``` bash $ conda config --add channels defaults $ conda config --add channels liulab-dfci $ conda config --add channels bioconda $ conda config --add channels conda-forge
$ conda install mamba -c conda-forge $ mamba create -n MAESTRO maestro=1.3.2 -c liulab-dfci
$ conda activate MAESTRO ```
If users already have the processed datasets, like cell by gene or cell by peak matrix generate by Cell Ranger. Users can install the MAESTRO R package alone to perform the analysis from processed datasets. ``` bash $ R
library(devtools) install_github("liulab-dfci/MAESTRO") ```
The full MAESTRO workflow requires extra annotation files and tools:
MAESTRO depends on starsolo and minimap2 for mapping scRNA-seq and scATAC-seq dataset. Users need to generate the reference files for the alignment software and specify the path of the annotations to MAESTRO through command line options.
MAESTRO utilizes LISA2 to evaluate the enrichment of transcription factors based on the marker genes from scRNA-seq clusters. If users want to use LISA2, they need to download and install reference data either for human or for mouse locally and build the data according to the LISA2 document. The input gene set can be constituted of only official gene symbols, only RefSeq ids, only Ensembl ids, only Entrez ids, or a mixture of these identifiers.
MAESTRO utilizes giggle to identify enrichment of transcription factor peaks in scATAC-seq cluster-specific peaks. By default giggle is installed in MAESTRO environment. The giggle index for Cistrome database can be downloaded here (Note: Before v1.2.0, the giggle index
giggle.tar.gzcan be downloaded from http://cistrome.org/~galib/giggle.tar.gz. Since v1.2.0, please download the latest index giggle.all.tar.gz). Users need to download the file and provide the location of the giggle annotation to MAESTRO when using the ATACAnnotateTranscriptionFactor function.
usage: MAESTRO [-h] [-v] {scrna-init,scatac-init,integrate-init, multi-scatac-init, samples-init, mtx-to-h5,count-to-h5,merge-h5,scrna-qc,scatac-qc,scatac-peakcount,scatac-genescore}
There are ten functions available in MAESTRO serving as sub-commands.
Subcommand |
Description |
---|
scrna-init| Initialize the MAESTRO scRNA-seq workflow.
scatac-init| Initialize the MAESTRO scATAC-seq workflow.
integrate-init| Initialize the MAESTRO integration workflow.
multi-scatac-init| Initialize the MAESTRO multi-sample scATAC-seq workflow.
samples-init| Initialize samples.json file in the current directory.
mtx-to-h5| Convert 10X mtx format matrix to HDF5 format.
count-to-h5| Convert plain text count table to HDF5 format.
merge-h5| Merge multiple HDF5 files, e.g. different replicates.
scrna-qc| Perform quality control for scRNA-seq gene-cell count matrix.
scatac-qc| Perform quality control for scATAC-seq peak-cell count matrix.
scatac-peakcount| Generate peak-cell binary count matrix.
scatac-genescore| Calculate gene score based on the binarized scATAC peak count.
Example of running MAESTRO can be found at the following galleries. Please use
MAESTRO COMMAND -hto see the detail description for each option of each module.
Wang C, Sun D, Huang X, Wan C, Li Z, Han Y, Qin Q, Fan J, Qiu X, Xie Y, Meyer CA, Brown M, Tang M, Long H, Liu T, Liu XS. Integrative analyses of single-cell transcriptome and regulome using MAESTRO. Genome Biol. 2020 Aug 7;21(1):198. doi: 10.1186/s13059-020-02116-x. PMID: 32767996; PMCID: PMC7412809.