I put on github an R package for some of my ideas on removing unwanted variation.

Let

\(Y = XB + ZA + E\),

for

- * \(Y\) an \(n\) by \(p\) matrix of gene expression data with \(n\) samples and \(p\) genes,
- * \(X\) an \(n\) by \(q\) matrix of \(q\) covariates,
- * \(B\) a \(q\) by \(p\) matrix of unobserved coefficients for the observed covariates,
- * \(Z\) an \(n\) by \(k\) matrix of hidden confounders,
- * \(A\) a \(k\) by \(p\) matrix of hidden coefficients for the hidden confounders, and
- * \(E\) an \(n\) by \(p\) matrix of independent normal errors with column variances \(s_1,\ldots,s_p\).

Not accounting for the hidden covariates, \(Z\), can reduce power and
result in poor control of false discovery rate. The
`vicar`

package provides a suite
of functions to adjust for hidden confounders, both when one has and
does not have access to control genes.

The functions `mouthwash`

and `backwash`

can adjust for hidden
confounding when one does not have access to control genes. They do so
via non-parametric empirical Bayes methods that use the powerful
methodology of Adaptive SHrinkage (Stephens 2016) within the
factor-augmented regression framework described in Wang et
al. (2015). `backwash`

is a slightly more Bayesian version of
`mouthwash`

. You can read about these methods in my preprint.

When one has control genes, there are many approaches to take. Such methods include RUV2 (J. A. Gagnon-Bartsch and Speed 2012), RUV4 (J. Gagnon-Bartsch, Jacob, and Speed 2013), and CATE (Wang et al. 2015). This package adds to the field of confounder adjustment with control genes by

- (1) Implementing a version of CATE that is calibrated using control genes similarly to the method in J. Gagnon-Bartsch, Jacob, and Speed (2013). The function is called
`vruv4`

. - (2) Introduces RUV3, a version of RUV that can be considered both RUV2 and RUV4. The function is called
`ruv3`

. - (3) Introduces RUV-impute, a more general framework for accounting for hidden confounders in regression. The function is called
`ruvimpute`

- (4) Introduces RUV-Bayes, a Bayesian version of RUV. The function is called
`ruvb`

.

Many of these ideas are described in Gerard and Stephens (2017).

Gagnon-Bartsch, Johann A, and Terence P Speed. 2012. Using Control Genes to Correct for Unwanted Variation in Microarray Data. *Biostatistics* 13 (3). Biometrika Trust: 539–52. doi:10.1093/biostatistics/kxr034.

Gagnon-Bartsch, Johann, Laurent Jacob, and Terence Speed. 2013. Removing Unwanted Variation from High Dimensional Data with Negative Controls. Technical Report 820, Department of Statistics, University of California, Berkeley. http://statistics.berkeley.edu/tech-reports/820.

**Gerard, D.**, & Stephens, M. (2017). Empirical Bayes
Shrinkage and False Discovery Rate Estimation, Allowing For Unwanted
Variation. *arXiv preprint arXiv:1709.10066*. [Link to arXiv]

**Gerard, D.**, & Stephens, M. (2017). Unifying and Generalizing Methods for Removing Unwanted Variation Based on Negative Controls. *arXiv preprint arXiv:1705.08393*. [Link to arXiv]

Stephens, Matthew. 2016. False Discovery Rates: A New Deal. *Biostatistics*. doi:10.1093/biostatistics/kxw041.

Wang, Jingshu, Qingyuan Zhao, Trevor Hastie, and Art B Owen. 2015. Confounder Adjustment in Multiple Hypothesis Testing. *ArXiv Preprint ArXiv:1508.04178*. https://arxiv.org/abs/1508.04178.