#
Align Sequences to a Genome
After your sequences (in FASTQ format) have been checked for quality, you will need to align them to a reference genome before you can call variants. Harpy offers several aligners for this purpose:
Despite the fact that EMA is the only barcode-aware aligner offered, when using BWA or strobealign, Harpy retains the barcode information from the sequence headers and will
assign molecule identifiers (MI:i
SAM tags) based on these barcodes and the molecule distance threshold.
#
New
Works with regular WGS data
Starting with Harpy version 2
, the --ignore-bx
option lets you skip the workflow
routines that do things specific to linked reads, meaning you can comfortably use
harpy align bwa
and
harpy align strobe
to align your WGS sequence data.
RADseq data
RADseq data will probably work fine too, however you may need to post-process the BAM files to unset the duplicate flag, as marking duplicates in RADseq (without UMIs) may cause issues with SNP calling:
samtools view -b -h --remove-flags 1024 -o output.bam input.bam