# Home

Mimick is a simulator for linked-read FASTQ data. It started its life as XENIA, a module in VISOR, and has since added features and perks. Mimick allows you to simulate an arbitrary number of haplotypes, set overall coverage, molecule coverage, and mix-match barcodes with linked-read chemistries.

It's simple, really. This software mimics linked-read data, Pavel has an affinity for naming software after fictional monsters and "mimick" (with a "k") is the old-English spelling of the word, leaving mimic available for some other bioinformatician to use for a less farcical reason. Despite the lore of mimics being deadly traps, this software is anything but, we promise.

# Supported Linked-Read Types:

  • 10X
  • Haplotagging
  • stLFR
  • TELLseq

# Get Started

See the installation and usage guides, then call up mimick in the command line to be greeted with:

 Usage: mimick [OPTIONS] BARCODES FASTA...                                                              

 Simulate linked-read FASTQ using genome haplotypes. Barcodes can be supplied one of two ways:          

  1 let Mimick randomly generate barcodes based on a specification of length,count                      
     • two integers, comma-separated, no space                                                          
     • e.g. 16,400000 would generate 400,000 unique 16bp barcodes                                       
  2 you can provide a file of specific nucleotide barcodes, 1 per line                                  

 You can specify the linked-read barcode chemistry to simulate via --lr-type as well as the output      
 format of FASTQ files (default is the same as barcode type). For example, you can generate 96 barcodes 
 (common haplotagging style), select --lr-type stlfr (combinatorial 3-barcode on R2 read), and have     
 --output-type tellseq (@seqid:barcode header format).                                                  


   --lr-type      Format                                                                                
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                               
   10x/tellseq    single barcode on R1                                                                  
   haplotagging   R1 and R2 each have different combinatorial 2-barcodes                                
   stlfr          combinatorial 3-barcode on R2                                                         


   --output-type   Barcode Location                                      Example                        
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━      
   10x             start of R1 sequence                                  ATAGACCATAGAGGACA...           
   haplotagging    sequence header as BX:Z:ACBD                          @SEQID BX:Z:A0C331B34D87       
   standard        sequence header as BX:Z:BARCODE, no specific format   @SEQID BX:Z:ATACGAGACA         
   stlfr           appended to sequence ID via #1_2_3                    @SEQID#1_354_39                
   tellseq         appended to sequence ID via :ATCG                     @SEQID:TATTAGCAC               


╭─ General Options ────────────────────────────────────────────────────────────────────────────────────╮
│ --help               Show this message and exit.                                                     │
│ --circular       -C  contigs are circular/prokaryotic                                                │
│ --output-prefix  -o  output file prefix                                                              │
│                      [default: simulated/SIM]                                                        │
│ --output-type    -O  output format of FASTQ files                                                    │
│ --quiet          -q  0 all output, 1 no progress bar, 2 no output                                    │
│                      [default: 0]                                                                    │
│ --regions        -r  one or more regions to simulate, in BED format                                  │
│ --seed           -S  random seed for simulation                                                      │
│ --threads        -t  number of threads to use for simulation                                         │
│                      [default: 2; x>=1]                                                              │
│ --version            Show the version and exit.                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Read Simulation Parameters ─────────────────────────────────────────────────────────────────────────╮
│ --coverage     mean coverage target for simulated data                                               │
│                [default: 30.0; x>=0.05]                                                              │
│ --distance     outer distance between the two ends in bp                                             │
│                [default: 500; x>=0]                                                                  │
│ --error        base error rate                                                                       │
│                [default: 0.02; 0<=x<=1]                                                              │
│ --extindels    indels extension rate                                                                 │
│                [default: 0.25; 0<=x<=1]                                                              │
│ --indels       indels rate                                                                           │
│                [default: 0.15; 0<=x<=1]                                                              │
│ --lengths      length of R1,R2 reads in bp                                                           │
│                [default: 150,150]                                                                    │
│ --mutation     mutation rate                                                                         │
│                [default: 0.001; x>=0]                                                                │
│ --stdev        standard deviation of --distance                                                      │
│                [default: 50; x>=0]                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Linked Read Parameters ─────────────────────────────────────────────────────────────────────────────╮
│ --combinatorial      -b  treat barcodes as combinatorial with this many segments                     │
│                          [default: haplotagging]                                                     │
│ --molecule-attempts  -a  how many tries to create a molecule with <70% ambiguous bases               │
│                          [default: 300; x>=5]                                                        │
│ --molecule-coverage  -c  mean percent coverage per molecule if <1, else mean number of reads per     │
│                          molecule                                                                    │
│                          [default: 0.2; x>=1e-05]                                                    │
│ --molecule-length    -m  mean length of molecules in bp                                              │
│                          [default: 80000; x>=650]                                                    │
│ --molecules-per      -n  mean number of unrelated molecules per barcode per chromosome, where a      │
│                          negative number (e.g. -2) will use a fixed number of unrelated molecules    │
│                          and a positive one will draw from a Normal distribution                     │
│                          [default: 2]                                                                │
│ --singletons         -s  proportion of barcodes will only have a single read pair                    │
│                          [default: 0; 0<=x<=1]                                                       │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

 Documentation: https://pdimens.github.io/mimick/

# Authors

@pdimens (Mimick)

@davidebolo1993 (VISOR)