(from variant-simulated genomes)
Now that each individual for each treatment has SNPs simulated onto them, we can simulate the linked reads from these genomes. The most effective way of doing this would be to run this as one big loop and have ProcessPoolExecutor manage when tasks get submitted so we don’t swamp the system.
The depth treatments will also be created here. By keeping a consistent random seed within an individual, we can make sure that all depths less than 20 are actually a subsample of depth=20 and not a different set of reads. Doing it this way maintains the number of reads per molecule and % singletons. The depth treatments are:
0.5X
2X
5X
10X
20X
import os
from concurrent.futures import ProcessPoolExecutor
sizes = ["small", "medium", "large", "xl"]
depths = [0.5, 2, 5, 10, 20]os.makedirs("simulated_data/simuG/linked_reads", exist_ok = True)
with ProcessPoolExecutor(max_workers=4) as executor:
for size in sizes:
for depth in depths:
DIR=f"../simulated_variants/inversions_snps_genomes/{size}"
for i in range(1,11):
SAMPLE = f"sample_{i:02d}"
PREFIX = f"simuG/linked_reads/{size}/depth_{depth}/{SAMPLE}/{SAMPLE}"
mimick_args = f"--coverage {depth} --seed {i} --error 0.0001 --indels 0 --mutation 0 --lr-type haplotagging -O haplotagging --molecule-coverage 4 --molecules-per -1 --singletons 0.35"
barcodes = "6,96"
CMD = f"/home/pd348/.pixi/bin/pixi run mimick -t 6 -o {PREFIX} {mimick_args} {barcodes} {DIR}/{SAMPLE}.snp_inv.hap1.fasta.gz {DIR}/{SAMPLE}.snp_inv.hap2.fasta.gz"
executor.submit(os.system, CMD)VISOR-HACks instead of simuG¶
The code below is mostly the same, except it only tries to simulate reads using the new VISOR-created genomes
os.makedirs("simulated_data/visor/linkedreads", exist_ok = True)
with ProcessPoolExecutor(max_workers=10) as executor:
for size in sizes:
for depth in depths:
DIR = "simulated_genomes"
for i in range(1,11):
SAMPLE = f"sample_{i:02d}"
PREFIX = f"simulated_data/visor/linkedreads/{size}/depth_{depth}/{SAMPLE}/{SAMPLE}"
mimick_args = f"--coverage {depth} --seed {i} --error 0.0001 --indels 0 --mutation 0 --output-type haplotagging -O haplotagging --molecule-coverage 4 --molecules-per -1 --singletons 0.35"
barcodes = "6,96"
CMD = f"mimick -q 1 -t 6 -o {PREFIX} {mimick_args} {barcodes} {DIR}/{size}/{SAMPLE}.{size}.hap1.fasta.gz {DIR}/{size}/{SAMPLE}.{size}.hap2.fasta.gz && mv {PREFIX}.R*.fq.gz simulated_data/visor/linkedreads/{size}/depth_{depth}"
executor.submit(os.system, CMD)