Shrinks the target correlation using a uniform scaling factor so that the overall correlation matrix is positive semi-definite. The method is described in detail in Gerard (2020).
fix_cor(design_perm, target_cor, num_steps = 51)
A numeric design matrix whose rows are to be permuted (thus controlling the amount by which they are correlated with the surrogate variables). The rows index the samples and the columns index the variables. The intercept should not be included (though see Section "Unestimable Components").
A numeric matrix of target correlations between the
variables in design_perm
and the surrogate variables. The
rows index the observed covariates and the columns index the surrogate
variables. That is, target_cor[i, j]
specifies the target
correlation between the i
th column of design_perm
and the
j
th surrogate variable. The surrogate variables are estimated
either using factor analysis or surrogate variable analysis (see the
parameter use_sva
).
The number of columns in target_cor
specifies the number of
surrogate variables. Set target_cor
to NULL
to indicate
that design_perm
and the surrogate variables are independent.
The number of steps between 0 and 1 to take in the
grid search for the shrinkage factor. The step-size would be
1 / (num_steps - 1)
.
A matrix of correlations the same dimension as target_cor
.
Actually, the returned matrix is a * target_cor
, where a
was determined to make the overall correlation matrix positive
semi-definite.
Let \(W\) = cor(design_perm)
. Let \(R\) = target_cor
.
Then the overall correlation matrix is:
$$ \left(
\begin{array}{cc}
W & R\\
R' & I_K
\end{array}
\right).
$$
This function applies a multiplicative scaling factor to \(R\) until
the above matrix is positive semi-definite. That is, it finds \(a\)
between 0 and 1 such that
$$ \left(
\begin{array}{cc}
W & aR\\
aR' & I_K
\end{array}
\right)
$$
is positive semi-definite.
Gerard, D (2020). "Data-based RNA-seq simulations by binomial thinning." BMC Bioinformatics. 21(1), 206. doi: 10.1186/s12859-020-3450-9 .
n <- 10
design_perm <- matrix(rep(c(0, 1), length.out = n))
target_cor <- matrix(seq(1, 0, length.out = 10), nrow = 1)
new_cor <- seqgendiff:::fix_cor(design_perm = design_perm, target_cor = target_cor)
new_cor / target_cor
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 0.5291503 0.5291503 0.5291503 0.5291503 0.5291503 0.5291503 0.5291503
#> [,8] [,9] [,10]
#> [1,] 0.5291503 0.5291503 NaN
## In the case of one observed covariate, the requirement is just that
## the sum of squared correlations is less than or equal to one.
sum(target_cor ^ 2)
#> [1] 3.518519
sum(new_cor ^ 2)
#> [1] 0.9851852