# Home

Mimick is a simulator for linked-read FASTQ data. It started its life as XENIA, a module in VISOR, and has since added features and perks. Mimick allows you to simulate an arbitrary number of haplotypes, set overall coverage, molecule coverage, and mix-match barcodes with linked-read chemistries. You can also supply one fasta file and a vcf file to simulate haplotype-resolved reads for all the samples in the vcf.

It's simple, really. This software mimics linked-read data, Pavel has an affinity for naming software after fictional monsters and "mimick" (with a "k") is the old-English spelling of the word, leaving mimic available for some other bioinformatician to use for a less farcical reason. Despite the lore of mimics being deadly traps, this software is anything but, we promise.

# Supported Linked-Read Types:

10X
Haplotagging
stLFR
TELLseq

# Get Started

See the installation and usage guides, then call up mimick in the command line to be greeted with:

Usage: mimick [OPTIONS] FASTA...                                                
                                                                                
Simulate linked-read FASTQ data for one or many individuals                     
There are two modes of operation:                                               
                                                                                
 1 Input one or more FASTA files (haplotypes) to simulate linked reads for a    
   single individual.                                                           
 2 Input one FASTA and VCF file to simulate linked reads for all samples in the 
   VCF file with haplotypes reflective of their SNPs and indels.                
                                                                                
With the exception of 10x, all other formats are demultiplexed. Output can be in
standard:chemistry format (e.g. standard:stlfr outputs standard format with     
stLFR-style barcodes), where the barcode is encoded as a BX:Z: tag and a VX:i   
validation tag. Below are the common linked-read chemistries (to be used in     
--format) and their configurations:                                             
                                                                                
                                                                                
  chemistry      --read-lengths   Description            FASTQ format           
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 
  10x               134,150       single barcode on R1   barcode inline on R1   
  haplotagging      150,150       2-barcodes in I1/I2    BX:Z:AxxCxxBxxDxx tag  
  stlfr             150,108       3-barcode on R2        @SEQID#1_2_3           
  tellseq           132,150       single barcode on R1   @SEQID:ATGC            
                                                                                
                                                                                
General Options:                                                                
  --circular       -c  contigs are circular/prokaryotic                         
  --output-prefix  -o  output file prefix                                       
                       [default=simulated/]                                     
  --quiet          -q  toggle to hide progress bar                              
  --threads        -t  number of threads to use for multi-sample simulation     
                       [default=2]                                              
  --seed           -s  random seed for simulation                               
  --vcf            -v  VCF-formatted file containing genotypes from which to    
                       create per-sample haplotypes                             
  --version            Show the version and exit.                               
  --help               Show this message and exit.                              
                                                                                
Linked-Read Simulation:                                                         
  --format             -f  FASTQ output format                                  
                           [default=standard:haplotagging]                      
  --genomic-coverage   -g  mean coverage (depth) target for simulated data      
                           [default=30.0]                                       
  --insert-size        -i  outer distance between the two read ends in bp       
                           [default=500]                                        
  --insert-stdev       -d  standard deviation for --insert-size                 
                           [default=50]                                         
  --molecule-attempts  -A  how many tries to create a molecule with <70% ambiguous bases                                      
                           [default=300]                                        
  --molecule-coverage  -C  mean percent coverage per molecule if <1, else mean number of reads per molecule                         
                           [default=0.2]                                        
  --molecule-length    -L  mean length of molecules in bp                       
                           [default=80000]                                      
  --molecules-per      -N  mean number of unrelated molecules per barcode per chromosome, where a negative
                           number (e.g. -2) will use a fixed number of unrelated molecules and a positive
                           one will draw from a distribution           
                           [default=2]                                          
  --read-lengths       -l  length of R1,R2 sequences in bp                      
                           [default=150,150]                                    
  --singletons         -S  proportion of barcodes that will only have one read pair                                                 
                           [default=0]         

 Documentation: https://pdimens.github.io/mimick/

# Authors

@pdimens (Mimick)

@davidebolo1993 (VISOR)