#
Category: wgs
See all categories.
Once sequences have been trimmed and passed through other QC filters, they will need to be aligned to a reference genome. This module within Harpy expects filtered reads as input, such as those derived using . You can map reads onto a genome assembly with Harpy using the module:
Once sequences have been trimmed and passed through other QC filters, they will need to be aligned to a reference genome. This module within Harpy expects filtered reads as input, such as those derived using . You can map reads onto a genome assembly with Harpy using the module:
After variants have been called, you may want to impute missing genotypes to get the most from your data. Harpy uses STITCH to impute genotypes, a haplotype-based method that is linked-read aware. Imputing genotypes requires a variant call file containing SNPs, such as that produced by and preferably filtered in some capacity. You can impute genotypes with Harpy using the module:
You may want to phase your genotypes into haplotypes, as haplotypes tend to be more informative than unphased genotypes (higher polymorphism, captures relationship between genotypes). Phasing genotypes into haplotypes requires alignment files, such as those produced by and a variant call file, such as one produced by or . Phasing only works on SNP data, and will not work for structural variants produced by or , preferably filtered in some capacity. You can phase genotypes into haplotypes with Harpy using the module:
Raw sequences are not suitable for downstream analyses. They have sequencing adapters, index sequences, regions of poor quality, etc. The first step of any genetic sequence analyses is to remove these adapters and trim poor quality data. You can remove adapters, remove duplicates, and quality trim sequences using the module:
Simulate snps, indels, inversions, cnv, translocations You may want to benchmark haplotag data on different kinds of genomic variants. To do that, you'll need known variants, and typically simulations are how you achieve that. This series of modules simulates genomic variants onto a genome, either randomly or specific variants provided in VCF files. The simulator Harpy uses, simuG, can only simulate one type of variant at a time and each variant type has their own set of parameters. If you are interested in very fine-grained variation simulation, consider using VISOR/HACk. This page is divided by variant types to help you navigate the process. The general usage for simulating variants is:
After reads have been aligned, e.g., with , you can use those alignment files (.bam) to call variants in your data. Harpy can call SNPs and small indels using bcftools mpileup or with freebayes. You can call SNPs with the module: