#
Downsample data by barcode
- One of either:
- one alignment file .bam .sam case insensitive
- one set of paired-end reads in FASTQ format .fq .fastq gzip recommended case insensitive
- Barcodes in the
BX:Z
SAM tag for both BAM and FASTQ inputs- See Section 1 of the SAM Spec here for details
BX:Z
tags must be the last tag in the FASTQ/BAM record- use bx_to_end.py to move the BX tags to the ends, if needed
While downsampling (subsampling) FASTQ and BAM files is relatively simple with tools such as awk
, samtools
, seqtk
, seqkit
, etc.,
downsample
allows you to downsample a BAM file (or paired-end FASTQ) by barcodes. That means you can
keep all the reads associated with d
number of barcodes.
usage
harpy downsample OPTIONS... INPUT(S)...
example
# BAM file
harpy downsample -d 1000 -i 0.3 -p sample1.sub1000 sample1.bam
# FASTQ file
harpy downsample -d 1000 -i 0 -p sample1.sub1000 sample1.F.fq.gz sample1.R.fq.gz
#
Running Options
In addition to the common runtime options , the downsample module is configured using the command-line arguments below.
#
invalid barcodes
The --invalid
options determines what proportion of invalid barcodes appear in the barcode
pool. Bear in mind that the barcode pool still gets subsampled, so the --invalid
proportion
doesn't necessarily reflect how many end up getting sampled, rather what proportion will be
considered for sampling. The proportions equate to:
0
: invalid barcodes are skipped1
: all invalid barcodes appear in the barcode pool that gets subsampled0
<i
<1
: that proportion of barcodes appear in the barcode pool that gets subsampled
#
Downsample Workflow
graph LR subgraph fastq R1([read 1]):::clean---R2([read 2]):::clean end subgraph bam bamfile([bam]):::clean end fastq-->|bam conversion|bam bam-->sub([extract and\n subsample barcodes]):::clean sub-->exreads([extract reads]):::clean bam-->exreads fastq-->exreads style fastq fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px style bam fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px classDef clean fill:#f5f6f9,stroke:#b7c9ef,stroke-width:2px