#
Pre-flight checks for input files
- at least 2 cores/threads available
- preflight bam : SAM/BAM alignment files BAM recommended
-
preflight fastq
: paired-end reads from an Illumina sequencer in FASTQ format
gzip recommended
- sample names: a-z 0-9 . _ - case insensitive
- forward: _F .F .1 _1 _R1_001 .R1_001 _R1 .R1
- reverse: _R .R .2 _2 _R2_001 .R2_001 _R2 .R2
- fastq extension: .fq .fastq case insensitive
Harpy does a lot of stuff with a lot of software and each of these programs expect the incoming data to follow particular formats (plural, unfortunately). These formatting opinions/specifics are at the mercy of the original developers and while there are times when Harpy can (and does) modify input/output files for format compatability, it's not always feasible or practical to handle all possible cases. So, our solution is perform what we lovingly call "pre-flight checks" to assess if your input FASTQ or BAM files are formatted correctly for the pipeline. There are separate preflight fastq and preflight bam submodules and the result of each is a report detailing file format quality checks.
#
When to run
- preflight fastq : the preflight checks for FASTQ files are best performed after demultiplexing (or trimming/QC) and before sequence alignment
- preflight bam : the preflight checks for BAM files should be run after sequence alignment and before consuming those files for other purposes (e.g. variant calling, phasing, imputation)
harpy preflight fastq OPTIONS... INPUTS...
# example
harpy preflight fastq --threads 20 raw_data
harpy preflight bam OPTIONS... INPUTS...
# example
harpy preflight bam --threads 20 Align/bwa
#
Running Options
In addition to the common runtime options , the preflight fastq and preflight bam modules are configured using only command-line input arguments:
#
Workflow
Below is a table of the format specifics preflight fastq checks for FASTQ files. Since 10X data doesn't use the haplotagging data format, you will find little value in running preflight fastq on 10X FASTQ files. Take note of the language such as when "any" and "all" are written.
Below is a table of the format specifics preflight bam checks for SAM/BAM files. Take note of the language such as when "any" and "all" are written.
The default output directory is Preflight/fastq
or Preflight/bam
depending on which mode you are using.
The result of preflight
is a single HTML report in inputdir/Preflight/filecheck.xxx.html
where xxx
is either fastq
or bam
depending on which filetype you specified. The reports for both fastq
and bam
are very similar and give you both the
criteria of what type of format checking occurred, the context, relevance, and severity of those checks, along with pass/fails for each
file (or sample).