Equivariant Estimation

For covariance estimation in the multilinear normal model, others have explored maximum likelihood estimation [Hoff, 2011, Ohlson et al., 2013, Manceur and Dutilleul, 2013], regularized likelihood methods [Allen, 2011, He et al., 2014], or Bayesian methods with semi-conjugate priors [Hoff, 2011]. But little work has been done on the optimality properties of such estimators. The optimality of an estimator is frequently defined in terms of statistical risk. Suppose, for data \(X\), we have an estimator \(t(X)\) of a parameter \(\theta\). Denote the loss of using \(t(X)\) when the truth is \(\theta\) as \(L(\theta,t(X)) \in \mathbb{R}^+\). The risk, then, is the expected loss at \(\theta\) with respect to the distribution of \(X\), \(R(\theta,t) = E[L(\theta,t(X))|\theta]\).

We can gain some clues on finding optimal estimators, in terms of statistical risk, from studying the history of covariance estimation in classical multivariate statistics. It was first shown in James and Stein [1961] that in the multivariate normal model the maximum likelihood estimator (MLE) is not admissible (there exist estimators that beat the MLE in terms of risk everywhere). Indeed, it is not even minimax, and considerable reductions in risk can be made over a wide range of the parameter space [Lin and Perlman, 1985]. The approach of James and Stein [1961] was to use equivariance: For \(X \in \mathbb{R}^{p \times n}\), such that \(\text{vec}(X) \sim N_{p\times n}(0,I_n \otimes \Sigma)\), only consider estimators such that \(\hat{\Sigma}(AX) = A \hat{\Sigma}A^T\) for all \(A\) lower triangular with positive diagonal elements. If we restrict ourselves to this class estimators, then it is possible to find a uniformly best estimator, and this best estimator dominates the sample covariance matrix in terms of risk.

Using the notions of equivariance, in Gerard and Hoff [2015] we extended the results of James and Stein [1961] to the multilinear normal model. Unlike for the multivariate normal model, there is no simple characterization for the class of equivariant estimators in the multilinear normal model (beyond the definition of equivariance, of course). So we had to develop a generalized Bayes procedure to find the best equivariant estimator. This also required the derivation of a novel, yet simple, class of distributions over the space of symmetric and positive definite matrices which we called the "mirror-Wishart" distribution. The form of the estimator depends on the loss function we choose, but we developed a class of loss functions which are natural extensions of Stein's loss from the vector case to the tensor case. The best equivariant estimator using these "multiway Stein" losses have simple characterizations in terms of the posterior expectations of the component precision matrices. These estimators are also minimax. And unlike the estimator in James and Stein [1961], where the reductions in risk are modest, we observed risks as low as \(60\%\) that of the MLE, depending on the dimension of the tensor. This is uniform reduction over the whole parameter space.

References

Allen, G. I. (2011). Comment on article by Hoff. Bayesian Anal. 6(2), p. 197-201 doi:10.1214/11-BA606A

Gerard, D., & Hoff, P. (2015). Equivariant minimax dominators of the MLE in the array normal model. Journal of multivariate analysis, 137, p. 32-49. doi:10.1016/j.jmva.2015.01.020 | arXiv:1408.0424

He, S., Yin, J., Li, H., & Wang, X. (2014). Graphical model selection and estimation for high dimensional tensor data. Journal of Multivariate Analysis, 128, p. 165-185. doi:10.1016/j.jmva.2014.03.007

Hoff, P. D. (2011). Separable covariance arrays via the Tucker product, with applications to multivariate relational data. Bayesian Anal., 6(2), p. 179–196. doi:10.1214/11-BA606

W. James and Charles Stein. Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I, pages 361–379. Univ. California Press, Berkeley, Calif., 1961. [link]

Shang P. Lin and Michael D. Perlman. A Monte Carlo comparison of four estimators of a covariance matrix. In Multivariate analysis VI (Pittsburgh, Pa., 1983), pages 411–429. North-Holland, Amsterdam, 1985. [link]

Manceur, A. M., & Dutilleul, P. (2013). Maximum likelihood estimation for the tensor normal distribution: Algorithm, minimum sample size, and empirical bias and dispersion. Journal of Computational and Applied Mathematics, 239, p. 37-49. doi:10.1016/j.cam.2012.09.017

Ohlson, M., Ahmad, M. R., & Von Rosen, D. (2013). The multilinear normal distribution: Introduction and some basic properties. Journal of Multivariate Analysis, 113, p. 37-47. doi:10.1016/j.jmva.2011.05.015