#
Home
Harpy is a haplotagging data processing pipeline for Linux-based systems. It uses all the magic of Snakemake under the hood to handle the worklfow decision-making, but as a user, you just interact with it like a normal command-line program. Harpy uses both well known and niche programs to take raw haplotagging sequences and process them to become called SNP genotypes (or haplotypes) or large structural variants (inversions, deletions, duplications). Most of the settings are pre-configured and the settings you can modify are done at the command line.
#
What is haplotagging?
Linked-read sequencing exists to combine the throughput and accuracy of short-read sequencing with the long range haplotype information of long-read sequencing. Haplotagging is an implementation of linked-read sequencing developed by Meier et al. to:
- sequence a large number of samples
- achieve high molecular resolution
- do both within a reasonable budget
If you don't have haplotagged data, then Harpy will likely be of little to no use to you. See the haplotagging site for more information about haplotagging and why you might consider it for your study system.
#
Harpy Modules
Harpy is modular, meaning you can use different parts of it independent from each other. Need to only align reads?
Great! Only want to call variants? Awesome! All modules are called by harpy <module>
. For example, use harpy align
to align reads.
#
Using Harpy
You can call harpy
without any arguments (or with --help
) to print the docstring to your terminal. You can likewise call any of the modules without arguments or with --help
to see their usage (e.g. harpy align --help
).
Usage: harpy COMMAND [ARGS]...
Harpy haplotagging pipeline
An automated workflow to demultiplex sequences, trim and qc
reads, map sequences, call variants, impute genotypes, and
phase haplotypes of Haplotagging data. Batteries included.
demultiplex >> qc >> align >> snp >> impute >> phase >> sv
Documentation: https://pdimens.github.io/harpy/
╭─ Options ──────────────────────────────────────────────────╮
│ --version Show the version and exit. │
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────╯
╭─ Modules ──────────────────────────────────────────────────╮
│ demultiplex Demultiplex haplotagged FASTQ files │
│ qc Remove adapters and quality trim sequences │
│ align Align sample sequences to a reference genome │
│ snp Call SNPs and small indels │
│ sv Call large structural variants │
│ impute Impute genotypes using variants and sequences │
│ phase Phase SNPs into haplotypes │
╰────────────────────────────────────────────────────────────╯
╭─ Other Commands ───────────────────────────────────────────╮
│ preflight Run file format checks on haplotag data │
│ popgroup Create a sample grouping file │
│ stitchparams Create a template STITCH parameter file │
│ hpc Create a config file to run Harpy on an HPC │
╰────────────────────────────────────────────────────────────╯