I put on github an R package for some of my ideas on removing unwanted variation.

Let

Y = XB + ZA + E,

for

• * Y an n by p matrix of gene expression data with n samples and p genes,
• * X an n by q matrix of q covariates,
• * B a q by p matrix of unobserved coefficients for the observed covariates,
• * Z an n by k matrix of hidden confounders,
• * A a k by p matrix of hidden coefficients for the hidden confounders, and
• * E an n by p matrix of independent normal errors with column variances s1,$$\ldots$$,sp.

Not accounting for the hidden covariates, Z, can reduce power and result in poor control of false discovery rate. The vicar package provides a suite of functions to adjust for hidden confounders, both when one has and does not have access to control genes.

The functions mouthwash and backwash can adjust for hidden confounding when one does not have access to control genes. They do so via non-parametric empirical Bayes methods that use the powerful methodology of Adaptive SHrinkage (Stephens 2016) within the factor-augmented regression framework described in Wang et al. (2015). backwash is a slightly more Bayesian version of mouthwash.

When one has control genes, there are many approaches to take. Such methods include RUV2 (J. A. Gagnon-Bartsch and Speed 2012), RUV4 (J. Gagnon-Bartsch, Jacob, and Speed 2013), and CATE (Wang et al. 2015). This package adds to the field of confounder adjustment with control genes by

1. (1) Implementing a version of CATE that is calibrated using control genes similarly to the method in J. Gagnon-Bartsch, Jacob, and Speed (2013). The function is called vruv4.
2. (2) Introduces RUV3, a version of RUV that can be considered both RUV2 and RUV4. The function is called ruv3.
3. (3) Introduces RUV-impute, a more general framework for accounting for hidden confounders in regression. The function is called ruvimpute
4. (4) Introduces RUV-Bayes, a Bayesian version of RUV. The function is called ruvb.

Many of these ideas are described in Gerard and Stephens (2017).

## References

Gagnon-Bartsch, Johann A, and Terence P Speed. 2012. Using Control Genes to Correct for Unwanted Variation in Microarray Data. Biostatistics 13 (3). Biometrika Trust: 539–52. doi:10.1093/biostatistics/kxr034.

Gagnon-Bartsch, Johann, Laurent Jacob, and Terence Speed. 2013. Removing Unwanted Variation from High Dimensional Data with Negative Controls. Technical Report 820, Department of Statistics, University of California, Berkeley. http://statistics.berkeley.edu/tech-reports/820.

Gerard, D., & Stephens, M. (2017). Unifying and Generalizing Methods for Removing Unwanted Variation Based on Negative Controls. arXiv preprint arXiv:1705.08393. [Link to arXiv]

Stephens, Matthew. 2016. False Discovery Rates: A New Deal. Biostatistics. doi:10.1093/biostatistics/kxw041.

Wang, Jingshu, Qingyuan Zhao, Trevor Hastie, and Art B Owen. 2015. Confounder Adjustment in Multiple Hypothesis Testing. ArXiv Preprint ArXiv:1508.04178. https://arxiv.org/abs/1508.04178.