Convert data to Standard format
In the effort of making it painless to have your data in the preferred standard format, use
standardize
to quickly standardize FASTQ and BAM files. By default, standardization just moves the barcode (wherever it may be)
into a BX:Z SAM tag as-is and does a technology-appropriate validation of the barcode value, which it writes to the
VX:i tag. However, you can use --style to also convert the barcode style between formats. Keep in mind that each
barcode style has a different upper limit as to how many unique barcodes it can support, which may prevent successful conversions.
The styles are given as:
Running Options
BAM
If barcodes are present in the sequence name (stlfr, tellseq), this method moves the barcode to the BX:Z
tag of the alignment, maintaining the same barcode style by default (auto-detected). If moved to or already in a BX:Z tag,
will then write a complementary VX:i tag to describe barcode validation 0 (invalid) or 1 (valid).
Use --style to also convert the barcode to a different style (haplotagging, stlfr, tellseq, 10X),
which also writes a conversion.bc file to the working directory mapping the barcode conversions. Writes to stdout.
djinn standardize [--style] PREFIX INPUTS
djinn standardize-bam --style stflr yucca.bam > yucca.std.stlfr.bam
FASTQ
This conversion moves the barcode to the BX:Z tag in fastq records, maintaining the same barcode type by default (auto-detected).
See this section of the Harpy documentation for the location and format expectations for different linked-read technologies.
Also writes a VX:i tag to describe barcode validation 0 (invalid) or 1 (valid).
Use --style to also convert the barcode to a different style (haplotagging, stlfr, tellseq, 10X),
which will also write a conversion.bc file to the working directory mapping the barcode conversions.
Incompatible with 10X data
Standardization will not work with the 10X FASTQ format, where the barcodes are the first 16 bases of read 1. You will first need to demultiplex the barcodes from the sequences into the read headers.
djinn standardize [--style] PREFIX R1.fq R2.fq
djinn standardize --style stflr myotis_stlfr myotis.R1.fq.gz myotis.R2.fq.gz