Will return the estimated correlation between the design matrix and the surrogate variables when you assign a target correlation. The method is described in detail in Gerard (2020).
A numeric design matrix whose rows are to be permuted (thus controlling the amount by which they are correlated with the surrogate variables). The rows index the samples and the columns index the variables. The intercept should not be included (though see Section "Unestimable Components").
A matrix of surrogate variables
A numeric matrix of target correlations between the
variables in design_perm
and the surrogate variables. The
rows index the observed covariates and the columns index the surrogate
variables. That is, target_cor[i, j]
specifies the target
correlation between the i
th column of design_perm
and the
j
th surrogate variable. The surrogate variables are estimated
either using factor analysis or surrogate variable analysis (see the
parameter use_sva
).
The number of columns in target_cor
specifies the number of
surrogate variables. Set target_cor
to NULL
to indicate
that design_perm
and the surrogate variables are independent.
Should we calculate the correlation of the mean
design_perm
and sv
(calc_first = "mean"
), or
should we calculate the mean of the correlations between
design_perm
and sv
(calc_first = "cor"
)? This
should only be changed by expert users.
Should we use the Gale-Shapley algorithm
for stable marriages ("marriage"
) (Gale and Shapley, 1962)
as implemented in the matchingR package, or the Hungarian algorithm
(Papadimitriou and Steiglitz, 1982) ("hungarian"
)
as implemented in the clue package (Hornik, 2005)? The
Hungarian method almost always works better, so is the default.
The total number of simulated correlations to consider.
A matrix of correlations. The rows index the observed covariates
and the columns index the surrogate variables. Element (i, j) is
the estimated correlation between the ith variable in
design_perm
and the jth variable in sv
.
This function permutes the rows of design_perm
many times, each
time calculating the Pearson correlation between the columns of
design_perm
and the columns of sv
. It then returns the
averages of these Pearson correlations. The permutation is done
using permute_design
.
Gale, David, and Lloyd S. Shapley. "College admissions and the stability of marriage." The American Mathematical Monthly 69, no. 1 (1962): 9-15. doi: 10.1080/00029890.1962.11989827 .
Gerard, D (2020). "Data-based RNA-seq simulations by binomial thinning." BMC Bioinformatics. 21(1), 206. doi: 10.1186/s12859-020-3450-9 .
Hornik K (2005). "A CLUE for CLUster Ensembles." Journal of Statistical Software, 14(12). doi: 10.18637/jss.v014.i12 . doi: 10.18637/jss.v014.i12 .
C. Papadimitriou and K. Steiglitz (1982), Combinatorial Optimization: Algorithms and Complexity. Englewood Cliffs: Prentice Hall.
## Generate the design matrices and set target correlation -----------------
n <- 10
design_perm <- cbind(rep(c(0, 1), each = n / 2),
rep(c(0, 1), length.out = n))
sv <- matrix(rnorm(n))
target_cor <- matrix(c(0.9, 0.1), ncol = 1)
## Get estimated true correlation ------------------------------------------
## You should use a much larger iternum in practice
effective_cor(design_perm = design_perm,
sv = sv,
target_cor = target_cor,
iternum = 10)
#> [,1]
#> [1,] 0.65205164
#> [2,] 0.03197828