# 
         Harpy for (non linked-read) WGS data
    
As of Harpy v3, the program will auto-detect that your input FASTQ or BAM files are not linked-read data. This can also be forced with --unlinked / -U. You can safely ignore the linked-read information in some of the reports Harpy produces.
- version 2.0-2.6: use--ignore-bx
- version 2.7+use--lr-type none
As of version 2.0, Harpy can be used to process regular whole genome
sequencing (WGS) data. Specifically, you can quality checks and trim samples,
align sequences, call SNPs and small indels, phase, and impute genotypes. All of that is done setting
--lr-type none for workflows where --lr-type is available (--ignore-bx toggle in versions <2.7).
RADseq data may also work, however the SNP calling workflows
probably won't be very computationally efficient for a highly fragmented RAD assembly.
There is also another consideration for RADseq regarding marking duplicates (described below).
        # 
        Quality Assessment
    
Using 
    harpy qc
, you are able to detect and remove adapters, poly G tails, trim low
quality, bases, detect duplicates with UMIs, etc. You cannot use --deconvolve when ignoring
linked-read information.
harpy qc --unlinked --trim-adapters auto --min-length 50 data/WGS/sample_*.gz 
        # 
        Sequence Alignment
    
Likewise, you can use either 
    harpy align bwa
 or 
    harpy align strobe
 to align
your sequences onto a reference genome. The --molecule-distance will be ignored when
using --unlinked.
harpy align bwa --unlinked --min-quality 25 genome.fasta data/WGS/trimmed RADseq data
RADseq data will probably work fine too, however you may need to post-process the BAM files to unset the duplicate flag, as marking duplicates in RADseq (without UMIs) may cause issues with SNP calling:
samtools view -b -h --remove-flags 1024 -o output.bam input.bam
        # 
        Calling SNPs
    
The SNP-calling workflows in Harpy don't use linked-read information at all, so you would use harpy snp mpileup or harpy snp freebayes without any modifications.
harpy snp mpileup --regions 100000 --populations data.groups genome.fasta Align/strobe
        # 
        Impute Genotypes
    
You can use the third (usebx) column of the parameter file to disable the barcode-aware
routines of 
    harpy impute
 by setting the value to FALSE:
name    model   usebx   bxlimit   k       s       nGen
model1    diploid   FALSE    50000    10      5       50
model2    diploid   FASE    50000   15      10      100Naturally, ignoring barcodes will also ignore whatever values are set for bxlimit. Otherwise, invoke the imputation workflow as you would normally:
harpy impute -t 10 stitch.parameters data/variants.bcf data/*.bam
        # 
        Phase Genotypes
    
Like most of the other workflows, use --unlinked with 
    harpy phase
 to perform phasing without incorporating linked-read barcode
information. When using this option, the value for -d/--molecule-distance will be ignored:
harpy phase -t 10 --unlinked variants.bcf data/*.bam |||
 
                                