Downsample data by barcode
Downsample data by barcode
While downsampling (subsampling) FASTQ and BAM files is relatively simple with tools such as awk, samtools, seqtk, seqkit, etc.,
downsample
allows you to downsample a BAM file (or FASTQ(s)) by barcodes. That means you can
keep all the reads associated with -d number of barcodes or fraction of barcodes (e.g. -d 0.5 will downsample to 50% of all barcodes).
FASTQ
usage
djinn fastq sample -d <value> [-i] PREFIX INPUT
example | downsample fastq pair, including invalid barcodes
djinn fastq sample -d 1000 -i sample1.sub1000 sample1.F.fq.gz sample1.R.fq.gz
Running Options
SAM
usage
djinn sam sample -d <value> [-i] INPUT > output.bam
example | downsample bam file by 50% of the barcodes, ignoring invalid barcodes
djinn sam sample -t 8 -d 0.5 -b BC sample1.bam > sample1.half.bam