Downsample data by barcode

In 
Downsample data by barcode

While downsampling (subsampling) FASTQ and BAM files is relatively simple with tools such as awk, samtools, seqtk, seqkit, etc., downsample allows you to downsample a BAM file (or FASTQ(s)) by barcodes. That means you can keep all the reads associated with -d number of barcodes or fraction of barcodes (e.g. -d 0.5 will downsample to 50% of all barcodes).

FASTQ

usage
djinn fastq sample -d <value> [-i] PREFIX INPUT
example | downsample fastq pair, including invalid barcodes
djinn fastq sample -d 1000 -i sample1.sub1000 sample1.F.fq.gz sample1.R.fq.gz

Running Options

argument description
INPUT required A FASTQ file or file pair
-c --cache-size hidden number of reads to store before writing (bigger is faster, default: 10000)
-d --downsample required Number/fraction of barcodes to downsample to
-i --invalid Include this proportion of invalid barcodes in downsampling
-r --random-seed Random seed for sampling optional
-t --threads Number of threads to use for writing compressed output fastq files (default: 2)

SAM

usage
djinn sam sample -d <value> [-i] INPUT > output.bam
example | downsample bam file by 50% of the barcodes, ignoring invalid barcodes
djinn sam sample -t 8 -d 0.5 -b BC sample1.bam > sample1.half.bam

Running Options

argument description
INPUT required SAM/BAM file
-d --downsample required Number/fraction of barcodes to downsample to
-i --invalid Include invalid barcodes in downsampling
-r --random-seed Random seed for sampling optional
-t --threads Number of threads to use (default: 4)