moimix

R package for estimating multiplicity of infection from high-throughput sequencing data

View project on GitHub

moimix: an R package for evaluating multiplicity of infection in malaria parasites

Features

  • Estimate multiplicity of infection from massively parallel sequencing data
  • Estimate heterzygosity and within-isolate diversity directly from read-counts
  • Call major alleles within isolates from B-allele frequencies
  • Prepare SNP barcode data for use in COIL
  • Simulate single nucleotide variant data with known multiplicity of infection

How do I install moimix?

There are plans to put moimix on Bioconductor in the future, however it is currently only available to install as a development version from Github:

# install using devtools packages
# first install bioc dependencies
source("https://bioconductor.org/biocLite.R")
biocLite(c("graph", "Rgraphviz", "SeqArray", "SeqVarTools"))
devtools::install_github("bahlolab/moimix")

What data input does moimix require?

moimix makes use of the Genomic Data Storage (GDS) format used by the Bioconductor package SeqArray to provide fast access to VCF files in R.

To convert a VCF file to the GDS:

library(SeqArray)
seqVCF2GDS("isolate_snps.vcf.gz", "isolate_snps.gds")

It is also possible to estimate MOI from a matrix of read counts where the first column is the number of reads supporting the reference allele and the second column is the number of reads supporting the alternate allele.

How do I use moimix?

See the introduction vignette for usage examples.

How do I cite moimix?

Manuscript is currently in preparation. If you use moimix please cite the following

Lee S, Harrison A, Tessier N, Tavul L, Miotto O, Siba P, Kwiatkowski D, Mùˆller I, Barry AE and Bahlo M, Assessing clonality in malaria parasites using massively parallel sequencing data, 2016, in preparation.