#
Downsample data by barcode
being deprecated
DEPRECATION NOTICE
The downsample module is being replaced with djinn,
which is provided with a conda-based Harpy installation as of version 3.1 (build 2).
It will be officially removed from Harpy starting with version 3.3.
- One of either:
- one alignment file .bam .sam case insensitive
- one set of paired-end reads in FASTQ format .fq .fastq ❤️ gzipped recommended case insensitive
- Barcodes in a SAM tag (e.g.
BX:Z:) for both BAM and FASTQ inputs- See Section 1 of the SAM Spec here for details
While downsampling (subsampling) FASTQ and BAM files is relatively simple with tools such as awk, samtools, seqtk, seqkit, etc.,
downsample
allows you to downsample a BAM file (or paired-end FASTQ) by barcodes. That means you can
keep all the reads associated with d number of barcodes or -d fraction of barcodes (e.g. -d 0.5 will downsample to 50% of all barcodes).
# BAM file
harpy downsample OPTIONS... BAM
# FASTQ files
harpy downsample OPTIONS... FASTQ1 FASTQ2
harpy downsample -d 1000 -b BC -i 0.3 -p sample1.sub1000 sample1.bam
harpy downsample -d 1000 -i 0 -p sample1.sub1000 sample1.F.fq.gz sample1.R.fq.gz
#
Running Options
In addition to the common runtime options , the downsample module is configured using the command-line arguments below.
#
invalid barcodes
The --invalid options determines what proportion of invalid barcodes appear in the barcode
pool that gets subsampled. The --invalid proportion doesn't necessarily reflect how many
end up getting sampled, rather what proportion will be considered for sampling. The proportions equate to:
0: invalid barcodes are skipped altogether1: all invalid barcodes appear in the barcode pool that gets subsampled0<i<1: that proportion of barcodes appear in the barcode pool that gets subsampled
#
Downsample Workflow
graph LR
subgraph fastq
R1([read 1]):::clean---R2([read 2]):::clean
end
subgraph bam
bamfile([bam]):::clean
end
fastq-->|bam conversion|bam
bam-->sub([extract and\n subsample barcodes]):::clean
sub-->exreads([extract reads]):::clean
bam-->exreads
fastq-->exreads
style fastq fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px,rx:10px,ry:10px
style bam fill:#f0f0f0,stroke:#e8e8e8,stroke-width:2px,rx:10px,ry:10px
classDef clean fill:#f5f6f9,stroke:#b7c9ef,stroke-width:2px