#
Troubleshooting
Lots of stuff can go wrong during an analysis. The intent of this page is to guide you through navigating the inevitable errors associated with doing bioinformatics.
#
Troubleshooting Harpy
Harpy has two steps: first it performs checks and validations, then it runs Snakemake.
#
checks and validations
First, Harpy takes your command-line inputs and checks/validates the input files and parameters. If your parameters are not the correct type (e.g. a number where there should be a file), the program will error immediately and inform you that something isn't right.
If all the command-line options pass validation, then the workflow inputs will be validated with all kinds of scripts and checks internally. For example, an input genome FASTA will be checked to be a properly formatted FASTA file. We do our best to include an error message that guides you on how to fix things and try again. If there are errors you're receiving that you think could use better wording or more information, please let us know!
#
snakemake validations
Once all the file validations pass, Harpy passes the baton over to
Snakemake. Snakemake builds a workflow graph of the rules and performs
its own checks. If you get an error before the workflow starts processing any data (there
won't yet be a Snakemake log file created), then something is wrong with
the Snakefile. Harpy may print the error to the terminal, but it's
possible you won't see any Snakemake error text (let us know!). If no
helpful error text is printed, then you should run the Snakemake command
directly and explore the output. You can copy and paste the Snakemake
command from the config.yaml
file created by Harpy, specifically listed
with workflow_call:
. If it seems like something on our end, please
open an issue
and include the error text Snakemake provides.
#
error during a workflow
Sometimes something goes wrong with one of the steps in a workflow. If/when that happens, Harpy will print the offending step and all the information Snakemake has regarding the failure. If the step had a log file, it will print the log information too, hopefully making it easier to figure out what's wrong.
#
Common Issues
#
installation issue
Conda is an awesome package manager, but was slow and used a ton of memory
as dependencies increased. Recent (23.10+
) versions of Conda now use the libmamba
solver,
the super-fast and super-lightweight solver from Mamba. If you're experiencing
a suspiciously slow Harpy installation, either update Conda to at least version 23.10
or use Mamba.
#
imputation or phasing failure
If you use bamutils clipOverlap
on alignments that are used for the
impute
or
phase
modules, they will cause both programs to error. We don't know why, but they do.
Solution: Do not clip overlapping alignments for bam files you intend to use for the impute or phase modules. Harpy does not clip overlapping alignments, so alignments produced by Harpy should work just fine.
#
SAM name and ID mismatch
Aligning a sample to a genome via Harpy will insert the sample name (based on the file name)
into the alignment header (the @RG ID:name SM:name
tag). It likewise expects, through various steps,
that the sample names in resulting vcf files match the filenames of associated bam files. This creates
problems when manually renaming alignment files after the creation of any vcf files. If you rename the
bam file, the alignments will still have the original sample name hardcoded into the file header.
Harpy will check for this and will preemtively warn you of a mismatch between file name and encoded
sample name. Due to certain expectations of the workflow, this mismatch will absolutely cause things
to fail, hence the pre-flight check.
Solution: If you need to rename a bam file, do so using the rename_bam script bundled with Harpy, which is a just a thin veneer over samtools addreplacerg
with some extra validations.
rename_bam newname input.bam
Call the script with no arguments to see the full usage instructions.
More cases will be added here as they become apparent to us