# Align Sequences to a Genome

After your sequences (in FASTQ format) have been checked for quality, you will need to align them to a reference genome before you can call variants. Harpy offers several aligners for this purpose:

aligner	speed	repository	publication
BWA	fast ⚡	github	paper
strobealign	super fast ⚡⚡	github	paper

Neither of these are linked-read aware aligners, but Harpy transfers the barcode information from the sequence headers into the alignments and will assign molecule identifiers (MI:i SAM tags) based on these barcodes and the molecule distance threshold.

# Non linked-read WGS data

Starting with Harpy v2.x, the --ignore-bx option lets you skip the workflow routines that do things specific to linked reads, meaning you can comfortably use harpy align bwa and harpy align strobe to align your WGS sequence data.

RADseq data

RADseq data will probably work fine too, however you may need to post-process the BAM files to unset the duplicate flag, as marking duplicates in RADseq (without UMIs) may cause issues with SNP calling:

samtools view -b -h --remove-flags 1024 -o output.bam input.bam