#
Home
Harpy is a haplotagging data processing pipeline for Linux-based systems. It uses all the magic of Snakemake under the hood to handle the worklfow decision-making, but as a user, you just interact with it like a normal command-line program. Harpy uses both well known and niche programs to take raw haplotagging sequences and process them to become called SNP genotypes (or haplotypes) or large structural variants (inversions, deletions, duplications). Most of the settings are pre-configured and the settings you can modify are done at the command line.
#
Harpy Modules
Harpy is modular, meaning you can use different parts of it independent from each other. Need to only align reads?
Great! Only want to call variants? Awesome! All modules are called by harpy <workflow>
. For example, use harpy align
to align reads.
#
Using Harpy
You can call harpy
without any arguments (or with --help
) to print the docstring to your terminal. You can likewise call any of the modules without arguments or with --help
to see their usage (e.g. harpy align --help
).
Usage: harpy COMMAND [ARGS]...
An automated workflow for haplotagging linked-read data to go from raw
data to genotypes (or phased haplotypes). Batteries included.
demultiplex >> qc >> align >> snp >> impute >> phase >> sv
Documentation: https://pdimens.github.io/harpy/
╭─ Options ─────────────────────────────────────────────────────────────╮
│ --version Show the version and exit. │
│ --help -h Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────╯
╭─ workflows ───────────────────────────────────────────────────────────╮
│ align Align sample sequences to a reference genome │
│ assembly Create an assembly from linked-reads │
│ demultiplex Demultiplex haplotagged FASTQ files │
│ impute Impute genotypes using variants and alignments │
│ metassembly Create a metassembly from linked-reads │
│ phase Phase SNPs into haplotypes │
│ qc Remove adapters and quality-control sequences │
│ simulate Simulate variants or linked-reads from a genome │
│ snp Call SNPs and small indels on alignments │
│ sv Call large structural variants on alignments │
╰───────────────────────────────────────────────────────────────────────╯
╭─ Other Commands ──────────────────────────────────────────────────────╮
│ deconvolve Resolve clashing barcodes from different molecules │
│ downsample Downsample data by barcode │
│ hpc Profile templates for cluster job submissions │
│ imputeparams Create a template imputation parameter file │
│ popgroup Create a template grouping file for samples │
│ preflight File format checks for haplotag data │
│ resume Resume a workflow from an existing Harpy directory │
│ view View a workflow log, config, or snakefile │
╰───────────────────────────────────────────────────────────────────────╯
#
Typical Linked-Read Workflows
Depending on your project goals, you may want any combination of SNPs, structural variants (inversions, deletions, duplications), or phased haplotypes. Below is a flow chart outlining a general workflow of linked-read data.
graph LR Demux([demultiplex]):::clean--->QC([QC, trim adapters, etc.]):::clean QC--->Align([align sequences]):::clean Align--->SNP([call SNPs]):::clean SNP--->Impute([impute genotypes]):::clean SNP--->Phase([phase haplotypes]):::clean Align--->SV([call structural variants]):::clean classDef clean fill:#f5f6f9,stroke:#b7c9ef,stroke-width:2px
Alternatively, if you are interested in assembling a genome or metagenome, your workflow might look like:
graph LR QC([QC, trim adapters, etc.]):::clean--->DC([barcode deconvolution]):::clean DC--->Assembly([assembly/metassembly]):::clean classDef clean fill:#f5f6f9,stroke:#b7c9ef,stroke-width:2px