#
Align Sequences to a Genome
After your sequences (in FASTQ format) have been checked for quality, you will need to align them to a reference genome before you can call variants. Harpy offers several aligners for this purpose:
Neither of these are linked-read aware aligners, but Harpy transfers the barcode information from the sequence headers into the alignments and will
assign molecule identifiers (MI:i
SAM tags) based on these barcodes and the molecule distance threshold.
#
Non linked-read WGS data
Starting with Harpy v2.x
, the --ignore-bx
option lets you skip the workflow
routines that do things specific to linked reads, meaning you can comfortably use
harpy align bwa
and
harpy align strobe
to align your WGS sequence data.
RADseq data
RADseq data will probably work fine too, however you may need to post-process the BAM files to unset the duplicate flag, as marking duplicates in RADseq (without UMIs) may cause issues with SNP calling:
samtools view -b -h --remove-flags 1024 -o output.bam input.bam