#
Resolve barcodes shared by different molecules
- paired-end reads from an Illumina sequencer in FASTQ format
❤️
gzipped recommended
- sample name: a-z 0-9 . _ - case insensitive
- forward: _F .F .1 _1 _R1_001 .R1_001 _R1 .R1
- reverse: _R .R .2 _2 _R2_001 .R2_001 _R2 .R2
- fastq extension: .fq .fastq case insensitive
Running
deconvolve
is optional. In the alignment
workflows (
align bwa
align strobe
), Harpy already uses a distance-based approach to
deconvolve barcodes and assign MI tags (Molecular Identifier). This workflow uses a reference-free method,
QuickDeconvolution, which uses k-mers to look at "read clouds" (all reads with the same linked-read barcode)
and decide which ones likely originate from different molecules. Regardless of whether you run
this workflow or not,
harpy align
will still perform its own deconvolution.
#
Running Options
#
Resulting Barcodes
After deconvolution, some barcodes may have a hyphenated suffix like -1 or -2 (e.g. A01C33B41D93-1).
This is how deconvolution methods create unique variants of barcodes to denote that identical barcodes
do not come from the same original molecules. QuickDeconvolution adds the -0 suffix to barcodes it was unable
to deconvolve.
#
Harpy Deconvolution Nuances
Some of the downstream linked-read tools Harpy uses expect linked read barcodes to either look like the 16-base 10X
variety or a standard haplotag (AxxCxxBxxDxx). Their pattern-matching would not recognize barcodes deconvoluted with
hyphens. To remedy this, MI assignment in
align bwa
and
align strobe
will assign the deconvolved (hyphenated) barcode to a DX:Z
tag and restore the original barcode as the BX:Z tag.