This will use either sva or an SVD on the residuals of a regression of mat on design_obs to estimate the surrogate variables.

est_sv(mat, n_sv, design_obs, use_sva = FALSE)

Arguments

mat

A numeric matrix of RNA-seq counts. The rows index the genes and the columns index the samples.

n_sv

The number of surrogate variables.

design_obs

A numeric matrix of observed covariates that are NOT to be a part of the signal generating process. Only used in estimating the surrogate variables (if target_cor is not NULL). The intercept should not be included (it will sometimes produce an error if it is included).

use_sva

A logical. Should we use surrogate variable analysis (Leek and Storey, 2008) using design_obs to estimate the hidden covariates (TRUE) or should we just do an SVD on log2(mat + 0.5) after regressing out design_obs (FALSE)? Setting this to TRUE allows the surrogate variables to be correlated with the observed covariates, while setting this to FALSE assumes that the surrogate variables are orthogonal to the observed covariates. This option only matters if design_obs is not NULL. Defaults to FALSE.

Value

A matrix of estimated surrogate variables. The columns index the surrogate variables and the rows index the individuals. The surrogate variables are centered and scaled to have mean 0 and variance 1.

Author

David Gerard