# 
         Demultiplex Raw Sequences
    
- at least 2 cores/threads available
- paired-end reads from an Illumina sequencer in FASTQ format ❤️ gzipped recommended
When pooling samples and sequencing them in parallel on an Illumina sequencer, you will be given large multiplexed FASTQ
files in return. These files contain sequences for all of your samples and need to be demultiplexed using barcodes to
separate the sequences for each sample into their own files (a forward and reverse file for each sample). These barcodes
should have been added during the sample DNA preparation in a laboratory. The demultiplexing strategy will vary based on the
haplotagging technology you are using (read 
harpy demultiplex METHOD OPTIONS... R1_FQ R2_FQ I1_FQ I2_FQharpy demultiplex meier2021 --threads 20 --schema demux.schema Plate_1_S001_R*.fastq.gz Plate_1_S001_I*.fastq.gz
        # 
         Running Options
    
In addition to the common runtime options , the demultiplex meier2021 module is configured using these command-line arguments:
        # 
        Keeping Unknown Samples
    
It's not uncommon that some sequences cannot be demultiplexed due to sequencing errors at the ID location. Use --keep-unknown-samples/-u to
have Harpy still separate those reads from the original multiplex. Those reads will be labelled _unknown_sample.R*.fq.gz
        # 
        Keeping Unknown Barcodes
    
It's likewise not uncommon that sequencing errors make it so that the sequences don't match the list of known barcode segments. Use
--keep-unknown-barcodes/-b to have Harpy separate those reads out from the original multiplex as _unknown_barcodes.R*.fq.gz.
        # 
        Keep QX and RX Tags
    
Using --qx-rx, you can opt-in to retain the QX:Z (barcode PHRED scores) and RX:Z (nucleotide barcode)
tags in the sequence headers. These tags aren't used by any subsequent analyses, but may be useful for your own diagnostics.
        # 
        Haplotagging Types
    
meier2021
This was formerly known as gen1, which is being deprecated in the next minor release-- explicit is always better than implicit!
- Barcode configuration: 13 + 13
- sequencing mask: 151+13+13+151
- Sample identifier: Cxxbarcode
- Facility should not demultiplex
These are the original 13 + 13 barcodes described in Meier et al. 2021. You should request that the sequencing facility you used
do not demultiplex the sequences. Requires the use of bcl2fastq without sample-sheet and with the settings
--use-bases-mask=Y151,I13,I13,Y151 and --create-fastq-for-index-reads. With Generation I beadtags, the C barcode is sample-specific,
meaning a single sample should have the same C barcode for all of its sequences.
        # 
        Demultiplexing Schema
    
Generation I haplotags typically use a unique Cxx barcode per sample-- that's the barcode segment
that will be used to identify sequences by sample. However, any of the 4 segments (A,B,C,D) are valid, so long as the schema only features a single segment.
You will need to provide a simple text file to --schema (-s) with two columns, the first being the sample name, the second being
the identifying segment barcode (e.g., C19). This file is to be tab or space delimited and must have no column names.
Sample01    C01
Sample02    C02
Sample03    C03
Sample04    C04This will result in splitting the multiplexed reads into individual file pairs Sample01.F.fq.gz, Sample01.R.fq.gz, Sample02.F.fq.gz, etc.
A sample can have multiple barcodes, but a barcode cannot have multiple samples:
Sample01    D01
Sample02    D02
Sample03    D03
Sample03    D21Sample01    C01
Sample02    C02
Sample03    C02Sample01    C01
Sample02    D02
Sample03    C03
        # 
         Meier2021 Demultiplex Workflow
    
graph LR
    subgraph Inputs
        direction TB
        A[multiplexed FASTQ]:::clean---BX
        BX[index reads FASTQ]:::clean---SCH
        SCH[Sample Schema]:::clean
    end
    Inputs-->B([demultiplex samples]):::clean
    B-->D([quality metrics]):::clean
    style Inputs fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px,rx:10px,ry:10px
    classDef clean fill:#f5f6f9,stroke:#b7c9ef,stroke-width:2pxThe default output directory is Demultiplex with the folder structure below. Sample1 and Sample2 are
generic sample names for demonstration purposes. The resulting folder also includes a workflow directory
(not shown) with workflow-relevant runtime files and information.
Demultiplex/
├── Sample1.F.fq.gz
├── Sample1.R.fq.gz
├── Sample2.F.fq.gz
├── Sample2.R.fq.gz
└── reports
    └── demultiplex.QC.htmlThis is the summary report Harpy generates for this workflow. You may right-click
the image and open it in a new tab if you wish to see the example in better detail.
 
    
        # 
        Now using dmox!
    
Harpy v2 introduced a new demultiplexer under the hood called dmox, which is singificantly faster, lighter on memory, and has better maintenance than the previous solution. Iago Bonnici of Montpellier Bioinformatics Biodiversity (MBB) saw the need for better demultiplexing performance and took it upon themselves to donate their time to write a brand-new purpose-built demultiplexer for the Meier/Chan haplotagging bead design. Beyond just being way more performant, this new demultiplexer has more features, has more output options, and is flexible for haplotagging bead designs where the sample ID is not the C-segment. If you're happy with the performance of the new demultiplexing workflow, please let Iago/MBB know!
