# Common Harpy Options

# Input Arguments

Each of the main Harpy modules (e.g. qc or phase ) follows the format of

harpy module options arguments

where module is something like impute or snp mpileup and options are the runtime parameters, which can include things like an input --vcf file, --molecule-distance, etc. After the options is where you provide the input files/directories without flags and following standard BASH expansion rules (e.g. wildcards). You can mix and match entire directories, individual files, and wildcard expansions. In most cases, you can provide an unlimited amount of input arguments. In practice, that can look like:

harpy align bwa -t 5 -g genome.fasta data/pop1 data/pop2/trimmed*gz data/pop3/sample{1,2}* data/pop4/sample{2..5}*gz 

# Common command-line options

Every Harpy module has a series of configuration parameters. These are arguments you need to input to configure the module to run on your data, such as the directory with the reads/alignments, the genome assembly, etc. All main modules (e.g. qc ) also share a series of common runtime parameters that don't impact the results of the module, but instead control the speed/verbosity/etc. of calling the module. These runtime parameters are listed in the modules' help strings and can be configured using these arguments:

argument short name type default description
--output-dir -o string varies Name of output directory
--threads -t integer 4 Number of threads to use
--conda toggle Use local conda environments instead of preconfigured Singularity container
--skipreports toggle Skip the processing and generation of HTML reports in a workflow
--snakemake string Additional Snakemake options, in quotes
--quiet -q toggle Suppress the progress bars and other status text when running
--help -h Show the module docstring

As as example, you could call align strobe and specify 20 threads with no output to console:

harpy align strobe --threads 20 --quiet samples/trimmedreads

# identical to #

harpy align strobe -t 20 -q samples/trimmedreads

# The workflow folder

When you run one of the main Harpy modules, the output directory will contain a workflow folder. This folder is both necessary for the module to run and is very useful to understand what the module did, be it for your own understanding or as a point of reference when writing the Methods within a manuscript. The presence of the folder and the contents therein also allow you to rerun the workflow manually. The workflow folder may contain the following:

item contents utility
*.smk Snakefile with the full recipe of the workflow understanding the entire workflow
config.yml Configuration file generated from command-line arguments and consumed by the Snakefile general bookkeeping, advanced runs
report/*.Rmd RMarkdown files used to generate the fancy reports seeing math behind plots/tables or borrow code from
*.summary Plain-text overview of the important parts of the workflow bookkeeping and writing Methods in manuscripts

# The Genome folder

You will notice that many of the workflows will create a Genome folder in the working directory. This folder is to make it easier for Harpy to store the genome and the associated indexing/etc. files across workflows without having to redo things unnecessarily. Your input genome will be symlinked into that directory (not copied, unless a workflow requires gzipping/decompressing), but all the other files (.fai, .bwt, .bed, etc.) will be created in that directory.