Usage¶

MultiSpace includes 4 main commands.

usage: MultiSpace [-h] [-v] {Pipelineinit,Scorematrix,Mappingcell} ...

MultiSpace(Single-cell Multi-Omics Analysis In Space) is a s a computational framework that combines single-cell
multi-omic data such as scCOOL-seq with spatial transcriptomic information.

positional arguments:
  {Pipelineinit,Scorematrix,Mappingcell}
    Pipelineinit        Initialize the MultiSpace preprocessing workflow in a given directory. This will install
                        the snakemake rules and a config file in this directory. You can configure the config
                        file according to your needs, and run the workflow with Snakemake
    Scorematrix         Calculate each gene a gene by cell score matrix across all cells. WCG: Genebody/Promoter
                        methylation ratio matrix. GCH: Gene activity score matrix.
    Mappingcell         Map single cell to spatial location and get spatial epigenetic signal.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         Print version info.

Detailed usages are listed as follows:

MultiSpace Pipelineinit¶

Users should run this function first if they have raw files(fastq files). This function will preprocess data automatically using snakemake. The output can be the input of the MultiSpace Scorematrix and MultiSpace Mappingcell function. If users have different stages of sample data, users should better separate them by stage in absolute folders.

This function needs an input folder in the following structure(taking E8.5 mouse embryo data for example):

E8.5
├── DNA
│  └── 01.Raw
│        ├── Sample1_1.fastq.gz
│        ├── Sample1_2.fastq.gz
│        ├── Sample2_1.fastq.gz
│        └── Sample2_2.fastq.gz
└── RNA
   └── 01.Raw
         ├── Sample1_1.fastq.gz
         ├── Sample1_2.fastq.gz
         ├── Sample2_1.fastq.gz
         └── Sample2_2.fastq.gz

MultiSpace accepts DNA sequencing data generated by scCOOL-seq/scNMT-seq, et al; RNA sequencing data generated by Smart-seq2.

DNA/: DNA sequencing data folder, snakemake also generated outputs in this folder
RNA/: RNA sequencing data folder, snakemake also generated outputs in this folder
01.Raw: input fastq files folder
Sample1_1/2.fastq.gz: pair-end sequencing fastq files

usage: MultiSpace Pipelineinit [-h] [--species {hg38,mm10}] [--samplesheet SAMPLESHEET] [--directory DIRECTORY]
                               --fasta FASTA --fasta_fai FASTA_FAI --lambda_fasta LAMBDA_FASTA --rna_annotation
                               RNA_ANNOTATION [--star_index STAR_INDEX]

optional arguments:
  -h, --help            show this help message and exit

Input files arguments:
  --species {hg38,mm10}
                        Specify the genome assembly (hg38 for human and mm10 for mouse). DEFAULT: mm10.
  --samplesheet SAMPLESHEET
                        Path to sample names stored in a sheet.Row: sample name.

Running and output files arguments:
  --directory DIRECTORY
                        Path to the directory where the workflow shall be initialized and results shall be
                        stored. DEFAULT: MultiSpace.Path to where the config.yaml is stored.

Reference genome arguments:
  --fasta FASTA         Genome fasta file for mapping.Users should provide fasta file for human and mouse only
                        containing chrN, where N is the name of defined chromosome.
  --fasta_fai FASTA_FAI
                        Genome fasta file index.User can create fasta.fai using samtools faidx.
  --lambda_fasta LAMBDA_FASTA
                        Genome fasta file containing lambda sequence for bsmap mapping.Users can add lambda
                        sequence to fasta file showed upper.
  --rna_annotation RNA_ANNOTATION
                        Path of the annotation file required for . Users can define or provide annotation files
                        themselves.
  --star_index STAR_INDEX
                        Path of the reference index file for STAR mapping.Users need to build the index file for
                        the reference using command STAR --runThreadN N --runMode genomeGenerate --genomeDir ref
                        --genomeFastaFiles ref.fa --sjdbGTFfile refGene.gtf

Detail discription:

samplesheet: file containing each sample name as a row
directory: input fastq files folder
fasta/fasta_fai: genome fasta file/fasta index without random or unkown chromosome
lambda_fasta: fasta file containing lambda sequences used for calculating methylation conversion rate
star_index: build STAR index “STAR –runThreadN N –runMode genomeGenerate –genomeDir ref –genomeFastaFiles ref.fa –sjdbGTFfile refGene.gtf”

MultiSpace Scorematrix¶

Input: In this function, users can input a file path of WCG/GCH site by cell matrix in H5 format generated by snakemake. This function will output the matrix of each gene a methylation ratio matrix of genebody or promoter region/geneactivity score matrix of genebody.

usage: MultiSpace Scorematrix [-h] [--species {mm10,hg38}] [--cell_barcode CELL_BARCODE] [--file_path FILE_PATH]
                              [--out_dir OUT_DIR] [--out_prefix OUT_PREFIX] [--matrixtype {WCG,GCH}]
                              [--region {promoter,genebody}] [--distance DISTANCE]

optional arguments:
  -h, --help            show this help message and exit

Input arguments:
  --species {mm10,hg38}
                        Species (hg38 for human and mm10 for mouse). DEFAULT: mm10.
  --cell_barcode CELL_BARCODE
                        Location of the cell barcode list(generate by Preprocess snakemake pipeline). Cells which
                        passed quality check.
  --file_path FILE_PATH
                        Path to unipeak file and site_peak.h5 file

Output arguments:
  --out_dir OUT_DIR     Path to the directory where the result file shall be stored. DEFAULT: current directory.
  --out_prefix OUT_PREFIX
                        Prefix of output files. DEFAULT: MultiSpace.

Part arguments:
  --matrixtype {WCG,GCH}
                        Type of DNA methylation(WCG) or Chromatin accessibility(GCH) ratio gene by cell matrix to
                        generate.
  --region {promoter,genebody}
                        Type of gene region. promoter or genebody. Users need to specified region only when
                        calculating WCG score matrix. If not, MultiSpace will take promoter as default.GCH score
                        matrix takes promoter as specified.
  --distance DISTANCE   GCH: Gene score decay distance, could be optional from 1kb (promoter-based regulation) to
                        10kb (enhancer-based regulation). Recommend:10000 WCG: Distance of gene promoter region.
                        GENEBODY NOT REQUIRED! Recommend: 2000.

MultiSpace Mappingcell¶

This function can map single cell to spatial location according to the gene expression distance similarity of each cell and each spot using topic modelling algorithm. Here we use STRIDE to decompose cell types from spatial mixtures by leveraging topic profiles trained from single-cell transcriptomics. Users can see detailed usage from STRIDE. After mapping, users can get epigenetic signal value in spatial location.

In this function, users can input MultiSpace Pipelineinit snakemake output single-cell count matrix file and bin by cell matrix.

usage: MultiSpace Mappingcell [-h] [--sc_count_file SC_COUNT_FILE] [--sc_celltype_file SC_ANNO_FILE]
                          [--st_count_file ST_COUNT_FILE] [--gene_use GENE_USE]
                          [--spatial_location SPATIAL_LOCATION] [--model_dir MODEL_DIR]
                          [--epi_binfile EPI_BINFILE] [--epi_feature EPI_FEATURE] [--out_dir OUT_DIR]
                          [--out_prefix {WCG,GCH}] [--sc-scale-factor SC_SCALE_FACTOR]
                          [--st-scale-factor ST_SCALE_FACTOR] [--normalize]
                          [--ntopics NTOPICS_LIST [NTOPICS_LIST ...]]

optional arguments:
  -h, --help            show this help message and exit

Input arguments:
  --sc_count_file SC_COUNT_FILE
                        Location of the single-cell count matrix file. It could be tab-separated plain-text file
                        with genes as rows and cells as columns.
  --sc_celltype_file SC_ANNO_FILE
                        Location of the single-cell celltype annotation file. The file should be a tab-separated
                        plain-text file without header. The first column should be the cell name, and the second
                        column should be the corresponding celltype labels.
  --st_count_file ST_COUNT_FILE
                        Location of the spatial gene count file. It could be tab-separated plain-text file with
                        genes as rows and spots as columns.
  --gene_use GENE_USE   Location of the gene list file used to train the model. It can also be specified as
                        'All', but it will take a longer time. If not specified, MultiSpace will find
                        differential marker genes for each celltype, and use them to run the model.
  --spatial_location SPATIAL_LOCATION
                        Location of tissue spatial coordinates
  --model_dir MODEL_DIR
                        If users have the pre-trained model using the same scRNA-seq dataset, please provide the
                        path of 'model' directory.
  --epi_binfile EPI_BINFILE
                        Location of WCG/GCH.bin_peak.h5.Calculate DNA methylation or chromatin accessibility
                        epigenetic signal in spatial.
  --epi_feature EPI_FEATURE
                        Location of WCG/GCH/bin.merge.peak

Output arguments:
  --out_dir OUT_DIR     Path to the directory where the result file shall be stored. DEFAULT: current directory.
  --out_prefix {WCG,GCH}
                        Prefix of output files. WCG or GCH. If not specified, MultiSpace will set WCG as default.

Model arguments:
  --sc-scale-factor SC_SCALE_FACTOR
                        The scale factor for cell-level normalization. For example, 10000. If not specified,
                        MultiSpace will set the 75% quantile of nCount as default.
  --st-scale-factor ST_SCALE_FACTOR
                        The scale factor for spot-level normalization. For example, 10000. If not specified,
                        MultiSpace will set the 75% quantile of nCount for ST as default.
  --normalize           Whether or not to normalize the single-cell and the spatial count matrix. If set, the two
                        matrices will be normalized by the SD for each gene.
  --ntopics NTOPICS_LIST [NTOPICS_LIST ...]
                        Number of topics to train and test the model. MultiSpace will automatically select the
                        optimal topic number. Multiple numbers should be separated by space. For example,
                        --ntopics 6 7 8 9 10 . If not specified, MultiSpace will run several models with
                        different topic numbers, and select the optimal one.