# Harpy for (non linked-read) WGS data

By
Pavel Dimens
In 
Published 2025-02-06

As of version 2.0, Harpy can be used for the early stages of regular whole genome sequencing (WGS) bioinformatics. Specifically, you can quality checks and trim samples, align sequences, and call SNPs and small indels. All of that is done with the flick of the --ignore-bx switch. RADseq data may also work, however the SNP calling workflows probably won't be very computationally efficient for a highly fragmented RAD assembly. There is also another consideration for RADseq regarding marking duplicates (described below).

# Quality Assessment

Using harpy qc , are able to detect and remove adapters, poly G tails, trim low quality, bases, detect duplicates with UMIs, etc. You cannot use the deconvolution function of this workflow (--deconvolve).

harpy qc --ignore-bx --trim-adapters auto --min-length 50 data/WGS/sample_*.gz 

# Sequence Alignment

Likewise, you can use either harpy align bwa or harpy align strobe to align your sequences onto a reference genome. The --depth-window and --molecule-distance options are irrelevant and ignored when using --ignore-bx. Since EMA is a linked-read specific aligner, it is not available for WGS/RADseq data, nor would you get any value from trying to use it for such.

harpy align bwa --ignore-bx --genome genome.fasta --min-quality 25 data/WGS/trimmed 

# Calling SNPs

The SNP-calling workflows in Harpy don't use linked-read information at all, so you would use harpy snp mpileup or harpy snp freebayes without any modifications.

harpy snp mpileup --regions 100000 --populations data.groups --genome genome.fasta Align/strobe