Abstract

I provide instructions on how to incorporate a user-specified prior into ruvb. I provide an example with a simulated RNA-seq dataset. The RUVB method is described in detail in Gerard and Stephens (2021).

Data

We will use the simulated RNA-seq dataset described in the vignette sample_analysis. To read a description about these data, run the following in R:

utils::vignette("sample_analysis", package = "vicar")

We’ll first load in these data:

library(vicar)
library(ggplot2)
set.seed(2512)
data(sim_gtex)
Y <- sim_gtex$Y
X <- sim_gtex$X
ctl <- sim_gtex$ctl
which_null <- sim_gtex$which_null
beta <- sim_gtex$beta

As in sample_analysis, we’ll estimate the number of hidden confounders using the num.sv function from the sva R package.

num_sv <- sva::num.sv(dat = t(Y), mod = X, method = "be")
num_sv
## [1] 3

Specifying a Prior

In the ruvb function, one can use the prior_fun and prior_args parameters to specify a prior in the second step of RUVB. If \(p\) is the number of genes, \(m\) is the number of control genes, and \(k\) is the number of covariates of interest, then prior_fun takes the following as input

  1. beta_mat: An \(k\) by \(p - m\) matrix.
  2. Anything else the user desires, specified in the list prior_args.

By specifying the return_log argument in ruvb, prior_fun can either return the density (a non-negative numeric scalar), or the log-density (a real numeric scalar). For numerical stability reasons, generally you should have prior_fun return the log-density and then set return_log = TRUE.

Example

Suppose we had slightly stronger prior information that the signals are all relatively weak, so we’ll put a \(N(0, 5)\) prior on the effects. For numerical stability, we’ll return the log-density.

strong_prior <- function(beta_mat, sd_prior) {
  sum(stats::dnorm(beta_mat, mean = 0, sd = sd_prior, log = TRUE))
}

Since sd_prior is an argument we need to specify, we’ll use the prior_args parameter to set sd_prior = 5. Now let us run ruvb. Note that you should run the MCMC for many more iterations than what I’m doing here.

ruvbout <- ruvb(Y = Y, X = X, ctl = ctl, k = num_sv, cov_of_interest = 2,
                prior_fun = strong_prior, prior_args = list(sd_prior = 5),
                return_log = TRUE,
                fa_args = list(display_progress = FALSE, nsamp = 1000, thin = 1))

We’ll compare the results against just using the default uniform prior:

ruvbout_unif <- ruvb(Y = Y, X = X, ctl = ctl, k = num_sv, cov_of_interest = 2,
                     fa_args = list(display_progress = FALSE, nsamp = 1000, thin = 1))

They give about the same AUC’s.

aucnorm <- pROC::roc(response = which_null, predictor = c(ruvbout$lfsr2))$auc
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
aucunif <- pROC::roc(response = which_null, predictor = c(ruvbout_unif$lfsr2))$auc
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
aucdat <- data.frame(Prior = c("Normal", "Uniform"), AUC = c(aucnorm, aucunif))
knitr::kable(x = aucdat)
Prior AUC
Normal 0.8790000
Uniform 0.8788539

References

Gerard, David, and Matthew Stephens. 2021. “Unifying and Generalizing Methods for Removing Unwanted Variation Based on Negative Controls.” Statistica Sinica 31 (3): 1145–66. https://doi.org/10.5705/ss.202018.0345.