# :icon-terminal: Utilities
Harpy is the sum of its parts and some of those parts are utilities
used by the workflows that are accessible from within the Harpy conda environment (or however Harpy was installed).
This page serves to document those utilities, since using them outside of a workflow
might be useful too. Each of these utilities are accessible with the `harpy-utils` prefix, e.g.:
```bash
harpy-utils check-bam
```
You can call up the docstring for any one of these utilities by calling the program without any arguments.
You can view all the utilities by calling the docstring of `harpy-utils` itself.

### bx-stats-fq
```bash
bx-stats-fq input.fq > output.gz
```
Parses a FASTQ file to count: total sequences, total number of linked-read barcodes,
number of valid barcodes, number of invalid BX tags, and a count of positional
barcode invalidations (e.g. A00, _0_, N)

### bx-stats-sam
```bash
bx-stats input.bam > output.gz
```
Calculates various linked-read molecule metrics from the (coordinate-sorted) input alignment file.
Metrics include (per molecule): 
- number of reads
- position start
- position end
- length of molecule inferred from alignments
- total aligned basepairs
- total length of inferred inserts
- molecule coverage (%) based on aligned bases
- molecule coverage (%) based on total inferred insert length

### bx-to-end
```bash
bx-to-end input.[fq|bam] > output.[fq.gz|bam]
```
Parses the records of a FASTQ or BAM file and moves the `BX:Z` tag, if present, to
the end of the record, which makes the data play nice with LRez/LEVIATHAN. During 
alignment, Harpy will automatically move the `BX:Z` tag to the end of the alignment
record, so that will **not** require manual intervention.

### check-bam
```bash
check-bam platform input.bam > output.txt
```
Parses an aligment file to check:
- if the sample name matches the `RG` tag
- whether `BX:Z` is the last tag in the record
- the counts of: 
    - total alignments
    - alignments with an `MI:i` tag
    - alignments without `BX:Z` tag
    - incorrect `BX:Z` tag (specific to `platform`)

### check-fastq
```bash
check-fastq platform input.fq > output.txt
```
Parses a FASTQ file to check if any sequences don't conform to the SAM spec,
whether BX:Z: is the last tag in the record, and the counts of:
- total reads
- reads without `BX:Z` tag
- reads with incorrect `BX:Z` tag (specific to `platform`)

### haplotag-acbd
```bash
haplotag-acbd output_directory
```
Generates the `BC_{ABCD}.txt` files necessary to demultiplex Gen I haplotag barcodes into the specified `output_directory`.

### infer-sv
```bash
infer-sv file.bedpe [-f fail.bedpe] > outfile.bedpe
```
Create column in NAIBR bedpe output inferring the SV type from the orientation. Removes variants with FAIL flags
and you can use the optional `-f` (`--fail`) argument to output FAIL variants to a separate file.

### molecule-coverage
```bash
molecule-coverage -f genome.fasta.fai statsfile > output.cov
```
Using the statsfile generated by `bx_stats` from Harpy, will calculate "molecular coverage" across the genome.
Molecular coverage is the "effective" alignment coverage if you treat a molecule inferred from linked-read data as
one contiguous alignment, even though the reads that make up that molecule don't cover its entire length. Requires a
FASTA fai index (the kind created with `samtools faidx`) to know the actual sizes of the contigs.

### optical-dist-fq
```bash
optical-dist-fq input.fq > output.dist
```
Read the first record of a FASTQ file and print the optical duplication distance parameter (`100` or `2500`)
based on the instrument code of the sequence name.

### optical-dist-sam
```bash
optical-dist-sam input.sam > output.dist
```
Read the first record of a BAM file and print the optical duplication distance parameter (`100` or `2500`)
based on the instrument code of the sequence name.

### parse-phaseblocks
```bash
parse-phaseblocks input > output.txt
```
Parse a phase block file from HapCut2 to pull out summary information

### rename-bam
```bash
rename-bam [-d] new_name input.bam
```
Rename a sam/bam file and modify the `@RG` tag of the alignment file to reflect the change for both `ID` and `SM`.
This process creates a new file `new_name.bam` and you may use `-d` to delete the original file. Requires `samtools`.

### plot-depth
```bash
plot-depth [--molcov] [--coverage] [--prefix] contigs
```
Since per-contig plotting has been removed from alignment reports in Harpy >=4, this convenience utility is provided
to restore potting depth per contig, albeit no longer as a Circos plot. Outputs one html file of `{prefix}`.`{contig}`.depth.html per contig. Provide with a harpy-produced molecule coverage file, mosdep-produced coverage file, or both.
- `contigs`: name(s) of contigs to plot, space-separated (default = 30 largest)
