Usage¶
MultiSpace includes 4 main commands.
usage: MultiSpace [-h] [-v] {Pipelineinit,Scorematrix,Mappingcell} ...
MultiSpace(Single-cell Multi-Omics Analysis In Space) is a s a computational framework that combines single-cell
multi-omic data such as scCOOL-seq with spatial transcriptomic information.
positional arguments:
{Pipelineinit,Scorematrix,Mappingcell}
Pipelineinit Initialize the MultiSpace preprocessing workflow in a given directory. This will install
the snakemake rules and a config file in this directory. You can configure the config
file according to your needs, and run the workflow with Snakemake
Scorematrix Calculate each gene a gene by cell score matrix across all cells. WCG: Genebody/Promoter
methylation ratio matrix. GCH: Gene activity score matrix.
Mappingcell Map single cell to spatial location and get spatial epigenetic signal.
optional arguments:
-h, --help show this help message and exit
-v, --version Print version info.
Detailed usages are listed as follows:
MultiSpace Pipelineinit¶
Users should run this function first if they have raw files(fastq files). This function will preprocess data automatically using snakemake.
The output can be the input of the MultiSpace Scorematrix and MultiSpace Mappingcell function.
If users have different stages of sample data, users should better separate them by stage in absolute folders.
This function needs an input folder in the following structure(taking E8.5 mouse embryo data for example):
E8.5
├── DNA
│ └── 01.Raw
│ ├── Sample1_1.fastq.gz
│ ├── Sample1_2.fastq.gz
│ ├── Sample2_1.fastq.gz
│ └── Sample2_2.fastq.gz
└── RNA
└── 01.Raw
├── Sample1_1.fastq.gz
├── Sample1_2.fastq.gz
├── Sample2_1.fastq.gz
└── Sample2_2.fastq.gz
MultiSpace accepts DNA sequencing data generated by scCOOL-seq/scNMT-seq, et al; RNA sequencing data generated by Smart-seq2.
DNA/: DNA sequencing data folder, snakemake also generated outputs in this folderRNA/: RNA sequencing data folder, snakemake also generated outputs in this folder01.Raw: input fastq files folderSample1_1/2.fastq.gz: pair-end sequencing fastq files
usage: MultiSpace Pipelineinit [-h] [--species {hg38,mm10}] [--samplesheet SAMPLESHEET] [--directory DIRECTORY]
--fasta FASTA --fasta_fai FASTA_FAI --lambda_fasta LAMBDA_FASTA --rna_annotation
RNA_ANNOTATION [--star_index STAR_INDEX]
optional arguments:
-h, --help show this help message and exit
Input files arguments:
--species {hg38,mm10}
Specify the genome assembly (hg38 for human and mm10 for mouse). DEFAULT: mm10.
--samplesheet SAMPLESHEET
Path to sample names stored in a sheet.Row: sample name.
Running and output files arguments:
--directory DIRECTORY
Path to the directory where the workflow shall be initialized and results shall be
stored. DEFAULT: MultiSpace.Path to where the config.yaml is stored.
Reference genome arguments:
--fasta FASTA Genome fasta file for mapping.Users should provide fasta file for human and mouse only
containing chrN, where N is the name of defined chromosome.
--fasta_fai FASTA_FAI
Genome fasta file index.User can create fasta.fai using samtools faidx.
--lambda_fasta LAMBDA_FASTA
Genome fasta file containing lambda sequence for bsmap mapping.Users can add lambda
sequence to fasta file showed upper.
--rna_annotation RNA_ANNOTATION
Path of the annotation file required for . Users can define or provide annotation files
themselves.
--star_index STAR_INDEX
Path of the reference index file for STAR mapping.Users need to build the index file for
the reference using command STAR --runThreadN N --runMode genomeGenerate --genomeDir ref
--genomeFastaFiles ref.fa --sjdbGTFfile refGene.gtf
Detail discription:
samplesheet: file containing each sample name as a rowdirectory: input fastq files folderfasta/fasta_fai: genome fasta file/fasta index without random or unkown chromosomelambda_fasta: fasta file containing lambda sequences used for calculating methylation conversion ratestar_index: build STAR index “STAR –runThreadN N –runMode genomeGenerate –genomeDir ref –genomeFastaFiles ref.fa –sjdbGTFfile refGene.gtf”
MultiSpace Scorematrix¶
Input: In this function, users can input a file path of WCG/GCH site by cell matrix in H5 format generated by snakemake. This function will output the matrix of each gene a methylation ratio matrix of genebody or promoter region/geneactivity score matrix of genebody.
usage: MultiSpace Scorematrix [-h] [--species {mm10,hg38}] [--cell_barcode CELL_BARCODE] [--file_path FILE_PATH]
[--out_dir OUT_DIR] [--out_prefix OUT_PREFIX] [--matrixtype {WCG,GCH}]
[--region {promoter,genebody}] [--distance DISTANCE]
optional arguments:
-h, --help show this help message and exit
Input arguments:
--species {mm10,hg38}
Species (hg38 for human and mm10 for mouse). DEFAULT: mm10.
--cell_barcode CELL_BARCODE
Location of the cell barcode list(generate by Preprocess snakemake pipeline). Cells which
passed quality check.
--file_path FILE_PATH
Path to unipeak file and site_peak.h5 file
Output arguments:
--out_dir OUT_DIR Path to the directory where the result file shall be stored. DEFAULT: current directory.
--out_prefix OUT_PREFIX
Prefix of output files. DEFAULT: MultiSpace.
Part arguments:
--matrixtype {WCG,GCH}
Type of DNA methylation(WCG) or Chromatin accessibility(GCH) ratio gene by cell matrix to
generate.
--region {promoter,genebody}
Type of gene region. promoter or genebody. Users need to specified region only when
calculating WCG score matrix. If not, MultiSpace will take promoter as default.GCH score
matrix takes promoter as specified.
--distance DISTANCE GCH: Gene score decay distance, could be optional from 1kb (promoter-based regulation) to
10kb (enhancer-based regulation). Recommend:10000 WCG: Distance of gene promoter region.
GENEBODY NOT REQUIRED! Recommend: 2000.
MultiSpace Mappingcell¶
This function can map single cell to spatial location according to the gene expression distance similarity of each cell and each spot using topic modelling algorithm. Here we use STRIDE to decompose cell types from spatial mixtures by leveraging topic profiles trained from single-cell transcriptomics. Users can see detailed usage from STRIDE. After mapping, users can get epigenetic signal value in spatial location.
In this function, users can input MultiSpace Pipelineinit snakemake output single-cell count matrix file and bin by cell matrix.
usage: MultiSpace Mappingcell [-h] [--sc_count_file SC_COUNT_FILE] [--sc_celltype_file SC_ANNO_FILE]
[--st_count_file ST_COUNT_FILE] [--gene_use GENE_USE]
[--spatial_location SPATIAL_LOCATION] [--model_dir MODEL_DIR]
[--epi_binfile EPI_BINFILE] [--epi_feature EPI_FEATURE] [--out_dir OUT_DIR]
[--out_prefix {WCG,GCH}] [--sc-scale-factor SC_SCALE_FACTOR]
[--st-scale-factor ST_SCALE_FACTOR] [--normalize]
[--ntopics NTOPICS_LIST [NTOPICS_LIST ...]]
optional arguments:
-h, --help show this help message and exit
Input arguments:
--sc_count_file SC_COUNT_FILE
Location of the single-cell count matrix file. It could be tab-separated plain-text file
with genes as rows and cells as columns.
--sc_celltype_file SC_ANNO_FILE
Location of the single-cell celltype annotation file. The file should be a tab-separated
plain-text file without header. The first column should be the cell name, and the second
column should be the corresponding celltype labels.
--st_count_file ST_COUNT_FILE
Location of the spatial gene count file. It could be tab-separated plain-text file with
genes as rows and spots as columns.
--gene_use GENE_USE Location of the gene list file used to train the model. It can also be specified as
'All', but it will take a longer time. If not specified, MultiSpace will find
differential marker genes for each celltype, and use them to run the model.
--spatial_location SPATIAL_LOCATION
Location of tissue spatial coordinates
--model_dir MODEL_DIR
If users have the pre-trained model using the same scRNA-seq dataset, please provide the
path of 'model' directory.
--epi_binfile EPI_BINFILE
Location of WCG/GCH.bin_peak.h5.Calculate DNA methylation or chromatin accessibility
epigenetic signal in spatial.
--epi_feature EPI_FEATURE
Location of WCG/GCH/bin.merge.peak
Output arguments:
--out_dir OUT_DIR Path to the directory where the result file shall be stored. DEFAULT: current directory.
--out_prefix {WCG,GCH}
Prefix of output files. WCG or GCH. If not specified, MultiSpace will set WCG as default.
Model arguments:
--sc-scale-factor SC_SCALE_FACTOR
The scale factor for cell-level normalization. For example, 10000. If not specified,
MultiSpace will set the 75% quantile of nCount as default.
--st-scale-factor ST_SCALE_FACTOR
The scale factor for spot-level normalization. For example, 10000. If not specified,
MultiSpace will set the 75% quantile of nCount for ST as default.
--normalize Whether or not to normalize the single-cell and the spatial count matrix. If set, the two
matrices will be normalized by the SD for each gene.
--ntopics NTOPICS_LIST [NTOPICS_LIST ...]
Number of topics to train and test the model. MultiSpace will automatically select the
optimal topic number. Multiple numbers should be separated by space. For example,
--ntopics 6 7 8 9 10 . If not specified, MultiSpace will run several models with
different topic numbers, and select the optimal one.