Home
Harpy is a linked-read and non-linked WGS data processing pipeline for Linux-based systems. It uses all the magic of Snakemake under the hood to handle the worklfow decision-making, but as a user, you just interact with it like a normal command-line program. Harpy employs both well known and niche programs to take raw linked-read sequences and process them to become called SNP genotypes (or haplotypes) or large structural variants (inversions, deletions, duplications). Feel free to open an Issue or begin a Discussion on GitHub.
Commands
Harpy is modular, meaning you can use different parts of it independent from each other. Need to only align reads?
Great! Only want to call variants? Awesome! All modules are called by harpy <workflow>. For example, use harpy align to align reads.
You can call harpy without any arguments (or with --help) to print the docstring to your terminal. You can likewise call any of the modules without arguments or with --help to see their usage, e.g.:
harpy align --help
Utilities
An installation of Harpy also includes a series of scripts/utilities called harpy-utils that are available along with the harpy package. These scripts are used within Harpy workflows, but you can also use them outside of Harpy workflows.
harpy-utils molecule-coverage
Typical Workflows
Depending on your project goals, you may want any combination of SNPs, structural variants (inversions, deletions, duplications), or phased haplotypes. Below are diagrams outlining general workflows for linked-read data, depending on your goals.
Sample demultiplexing and linked-read barcode demultiplexing
Remove adapters, low quality sequences, reads that are too short, poly-G tails, etc.
Align sequences to a reference genome
Call Single Nucleotide Polymorphisms and small indels from alignments
Use existing data to heuristically fill missing data
Convert individual SNPs into multi-allele haplotypes reflecting alleles that were inherited together from each parent
Call structural variants (inversions, large deletions, and duplications) from alignments
Sample demultiplexing and linked-read barcode demultiplexing
Remove adapters, low quality sequences, reads that are too short, poly-G tails, etc.
Correct linked-read barcodes for unrelated sequences that share the same barcode by chance ("clashing")
Assemble sequences into a genome or metagenome