Skip to article frontmatterSkip to article content

04 Visualize Inversions

We need to visualize pretty much all of the inversions being called. It’s helpful to have a visual assessment of what’s going on. The next several notebooks are pretty much the same, except are exploring the different size-treatments (small, medium, large, xl). For sanity reasons, it would be helpful to create some generic functions that would be the basis of reading in the groundtruth, pooled, and sample inversions, along with one that plots the data.

Reading data

This function will take the size treatment (e.g. small) and depth treatment (e.g. 10X) and read all the data in, format it properly, and return a tidy table ready for plotting or other data exploration.

read_data <- function(size_treatment, depth){
    truthfile <- paste0("simulated_data/inversions/", size_treatment, "/inv.", size_treatment,".vcf")
    samplesfile <- paste("simulated_data/called_sv/leviathan", size_treatment, depth , "by_sample/inversions.bedpe", sep = "/")
    poolfile <- paste("simulated_data/called_sv/leviathan", size_treatment, depth, "by_pop/inversions.bedpe", sep = "/")

    true_inversions <- read.table(truthfile, header = F)[,c(1,2,8)]
    true_inversions$V8 <- as.numeric(unlist(lapply(true_inversions$V8, function(X){gsub(".+END=", "", X)})))
    names(true_inversions) <- c("contig", "position_start", "position_end")
    true_inversions$sample <- "simulated_inversions"

    sample_inversions <- read.table(
        paste("simulated_data/called_sv/leviathan", size_treatment, depth, "by_sample/inversions.bedpe", sep = "/"),
        header = T
    )[,1:4]

    pooled_inversions <- read.table(poolfile, header = T)
    pooled_inversions <- pooled_inversions[,c("population","contig", "position_start", "position_end")]
    names(pooled_inversions)[1] <- "sample"
    pooled_inversions$sample <- paste("pooled", depth, size_treatment, sep = "_")

    return(
        rbind(true_inversions, sample_inversions, pooled_inversions)
    )
}

Plotting Data

This function takes the inversion table generated by read_data() and plots a formatted ggplot of the inversions.

plot_data <- function(data, size_treatment, depth){
    ggplot(
        data,
        aes(
            x = position_start,
            xend = position_end,
            y = sample,
            yend = sample,
            color = sample
        )
    ) +
        scale_x_continuous(labels = scales::comma) +
        labs(title = "Called Inversions", subtitle = paste(size_treatment, depth), x = "genomic position (bp)", caption = "Simulated inversions (groundtruth) shown in top row") + 
        geom_segment(linewidth = 2) +
        facet_wrap(~contig, ncol = 1, scales = "free_x") +
        theme_light() +
        theme(panel.grid.major.y = element_blank(), legend.position = "None")
}