This will use either sva
or an SVD on the residuals
of a regression of mat
on design_obs
to estimate the
surrogate variables.
est_sv(mat, n_sv, design_obs, use_sva = FALSE)
A numeric matrix of RNA-seq counts. The rows index the genes and the columns index the samples.
The number of surrogate variables.
A numeric matrix of observed covariates that are NOT to
be a part of the signal generating process. Only used in estimating the
surrogate variables (if target_cor
is not NULL
).
The intercept should not be included (it will sometimes
produce an error if it is included).
A logical. Should we use surrogate variable analysis
(Leek and Storey, 2008) using design_obs
to estimate the hidden covariates (TRUE
)
or should we just do an SVD on log2(mat + 0.5)
after
regressing out design_obs
(FALSE
)? Setting this to
TRUE
allows the surrogate variables to be correlated with the
observed covariates, while setting this to FALSE
assumes that
the surrogate variables are orthogonal to the observed covariates. This
option only matters if design_obs
is not NULL
.
Defaults to FALSE
.
A matrix of estimated surrogate variables. The columns index the surrogate variables and the rows index the individuals. The surrogate variables are centered and scaled to have mean 0 and variance 1.