General imputation framework.

ruvimpute(
  Y,
  X,
  ctl,
  k = NULL,
  impute_func = em_miss,
  impute_args = list(),
  cov_of_interest = ncol(X),
  include_intercept = TRUE,
  do_variance = FALSE
)

Arguments

Y

A matrix of numerics. These are the response variables where each column has its own variance. In a gene expression study, the rows are the individuals and the columns are the genes.

X

A matrix of numerics. The covariates of interest.

ctl

A vector of logicals of length ncol(Y). If position i is TRUE then position i is considered a negative control.

k

The rank of the underlying matrix. Used by hard_impute if that is the value of impute_func. If not provided, will be estimated by num.sv.

impute_func

A function that takes as input a matrix names Y that has missing values and returns a matrix called Yhat of the same dimension of Y with the missing values filled in. If do_variance = TRUE, then impute_func should also return sig_diag --- a vector of column-specific variance estimates. The default is a wrapper for softImpute. I provide a few functions in this package. knn_wrapper performs k-nearest-neighbors imputation. missforest_wrapper performs random forest imputation. softimpute_wrapper performs nuclear norm minimization imputation.

impute_args

A list of additional parameters to pass to impute_func.

cov_of_interest

A vector of positive integers. The column numbers of the covariates in X whose coefficients you are interested in. The rest are considered nuisance parameters and are regressed out by OLS.

include_intercept

A logical. If TRUE, then it will check X to see if it has an intercept term. If not, then it will add an intercept term. If FALSE, then X will be unchanged.

do_variance

A logical. Does impute_func also return estimates of the column-specific variances?

Value

beta2hat The estimates of the coefficients of the covariates of interest that do not correspond to control genes.

betahat_long The estimates of the coefficients. Those corresponding to control genes are set to 0.

sebetahat If do_variance = TRUE, then these are the "standard errors" of beta2hat (but not really).

tstats If do_variance = TRUE, then these are the "t-statistics" of beta2hat (but not really).

pvalues If do_variance = TRUE, then these are the "p-values" of tstats (but not really).

References

  • Gerard, David, and Matthew Stephens. 2021. "Unifying and Generalizing Methods for Removing Unwanted Variation Based on Negative Controls." Statistica Sinica, 31(3), 1145-1166. doi: 10.5705/ss.202018.0345

Author

David Gerard