Implements empirical Bayes approaches to genotype
polyploids from next generation sequencing data while
accounting for allele bias, overdispersion, and sequencing
error. The main functions are flexdog()
and multidog()
, which allow the specification
of many different genotype distributions. Also provided
are functions to simulate genotypes, rgeno()
,
and read-counts, rflexdog()
, as well as
functions to calculate oracle genotyping error rates,
oracle_mis()
, and correlation with the true
genotypes, oracle_cor()
. These latter two
functions are useful for read depth calculations. Run
browseVignettes(package = "updog")
in R for example usage. See
Gerard et al. (2018)
<doi:10.1534/genetics.118.301468
>
and Gerard and Ferrao (2020)
<doi:10.1093/bioinformatics/btz852
>
for details on the implemented methods.
Details
The package is named updog
for "Using
Parental Data for Offspring Genotyping" because
we originally developed the
method for full-sib populations, but it works
now for more general populations.
Our best competitor is probably the fitPoly
package,
which you can check out at
https://cran.r-project.org/package=fitPoly. Though, we think
that updog
returns better calibrated measures of uncertainty
when you have next-generation sequencing data.
If you find a bug or want an enhancement, please submit an issue at https://github.com/dcgerard/updog/issues.
updog
Functions
flexdog()
The main function that fits an empirical Bayes approach to genotype polyploids from next generation sequencing data.
multidog()
A convenience function for running
flexdog()
over many SNPs. This function provides support for parallel computing.format_multidog()
Return arrayicized elements from the output of
multidog()
.filter_snp()
Filter SNPs based on the output of
multidog()
rgeno()
simulate the genotypes of a sample from one of the models allowed in
flexdog()
.rflexdog()
Simulate read-counts from the
flexdog()
model.plot.flexdog()
Plotting the output of
flexdog()
.plot.multidog()
Plotting the output of
multidog()
.oracle_joint()
The joint distribution of the true genotype and an oracle estimator.
oracle_plot()
Visualize the output of
oracle_joint()
.oracle_mis()
The oracle misclassification error rate (Bayes rate).
oracle_cor()
Correlation between the true genotype and the oracle estimated genotype.
updog
Datasets
snpdat
A small example dataset for using
flexdog
.uitdewilligen
A small example dataset
References
Gerard, D., Ferrão, L. F. V., Garcia, A. A. F., & Stephens, M. (2018). Genotyping Polyploids from Messy Sequencing Data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468 .
Gerard, David, and Luís Felipe Ventorim Ferrão. "Priors for genotyping polyploids." Bioinformatics 36, no. 6 (2020): 1795-1800. doi:10.1093/bioinformatics/btz852 .