#
Align Sequences to a Genome
After your sequences (in FASTQ format) have been checked for quality, you will need to align them to a reference genome before you can call variants. Harpy offers several aligners for this purpose:
Neither of these are linked-read aware aligners, but Harpy transfers the barcode information from the sequence headers into the alignments and will
assign molecule identifiers (MI:i SAM tags) based on these barcodes and the molecule distance threshold.
#
Non linked-read WGS data
Starting with Harpy v2.x, you can skip the workflow
routines that do things specific to linked reads, meaning you can comfortably use
harpy align bwa
and
harpy align strobe
to align your WGS sequence data.
- version
2.0-2.7:--ignore-bx - version
>2.7:--lr-type none - version
>=3.0: autodetected or forced with--unlinked
RADseq data
RADseq data will probably work fine too, however you may need to post-process the BAM files to unset the duplicate flag, as marking duplicates in RADseq (without UMIs) may cause issues with SNP calling:
samtools view -b -h --remove-flags 1024 -o output.bam input.bam
See also
Call SNPs and small indels
Each of the main Harpy modules (e.g. or ) follows the format of
Using Harpy to process your linked-read data
Quality trim haplotagged sequences with Harpy
Resolve barcodes shared by different molecules