# Resolve barcodes shared by different molecules

paired-end reads from an Illumina sequencer in FASTQ format gzip recommended
- sample name: a-z 0-9 . _ - case insensitive
- forward: _F .F .1 _1 _R1_001 .R1_001 _R1 .R1
- reverse: _R .R .2 _2 _R2_001 .R2_001 _R2 .R2
- fastq extension: .fq .fastq case insensitive

Running deconvolve is optional. In the alignment workflows ( align bwa align strobe ), Harpy already uses a distance-based approach to deconvolve barcodes and assign MI tags (Molecular Identifier), whereas the align ema workflow has the deconvolution occur within the ema aligner itself. This workflow uses a reference-free method, QuickDeconvolution, which uses k-mers to look at "read clouds" (all reads with the same linked-read barcode) and decide which ones likely originate from different molecules. Regardless of whether you run this workflow or not, harpy align will still perform its own deconvolution.

Won't work with EMA

Reads with deconvolved barcodes will not work with align ema , since EMA expects barcodes to have a specific, un-hyphenated format. If deconvolving, use either align bwa or align strobe for sequence alignment.

Also in harpy qc

This method of deconvolution is also available as an option in the qc workflow

usage
harpy deconvolve OPTIONS... INPUTS...

# Running Options

argument	default	description
`INPUTS`		required Files or directories containing input FASTQ files
`--density` `-d`	`3`	On average, \frac{1}{2^d} kmers are indexed
`--dropout` `-a`	`0`	Minimum cloud size to deconvolve
`--kmer-length` `-k`	`21`	Size of k-mers to search for similarities
`--window-size` `-w`	`40`	Size of window guaranteed to contain at least one kmer

# Resulting Barcodes

After deconvolution, some barcodes may have a hyphenated suffix like -1 or -2 (e.g. A01C33B41D93-1). This is how deconvolution methods create unique variants of barcodes to denote that identical barcodes do not come from the same original molecules. QuickDeconvolution adds the -0 suffix to barcodes it was unable to deconvolve.

# Harpy Deconvolution Nuances

Some of the downstream linked-read tools Harpy uses expect linked read barcodes to either look like the 16-base 10X variety or a standard haplotag (AxxCxxBxxDxx). Their pattern-matching would not recognize barcodes deconvoluted with hyphens. To remedy this, MI assignment in align bwa and align strobe will assign the deconvolved (hyphenated) barcode to a DX:Z tag and restore the original barcode as the BX:Z tag.

linked-read