For covariance estimation in the multilinear normal model, others have explored maximum likelihood estimation [Hoff, 2011, Ohlson et al., 2013, Manceur and Dutilleul, 2013], regularized likelihood methods [Allen, 2011, He et al., 2014], or Bayesian methods with semi-conjugate priors [Hoff, 2011]. But little work has been done on the optimality properties of such estimators. The optimality of an estimator is frequently defined in terms of statistical risk. Suppose, for data \(X\), we have an estimator \(t(X)\) of a parameter \(\theta\). Denote the loss of using \(t(X)\) when the truth is \(\theta\) as \(L(\theta,t(X)) \in \mathbb{R}^+\). The risk, then, is the expected loss at \(\theta\) with respect to the distribution of \(X\), \(R(\theta,t) = E[L(\theta,t(X))|\theta]\).

We can gain some clues on finding optimal estimators, in terms of statistical risk, from studying the history of covariance estimation in classical multivariate statistics. It was first shown in James and Stein [1961] that in the multivariate normal model the maximum likelihood estimator (MLE) is not admissible (there exist estimators that beat the MLE in terms of risk everywhere). Indeed, it is not even minimax, and considerable reductions in risk can be made over a wide range of the parameter space [Lin and Perlman, 1985]. The approach of James and Stein [1961] was to use equivariance: For \(X \in \mathbb{R}^{p \times n}\), such that \(\text{vec}(X) \sim N_{p\times n}(0,I_n \otimes \Sigma)\), only consider estimators such that \(\hat{\Sigma}(AX) = A \hat{\Sigma}A^T\) for all \(A\) lower triangular with positive diagonal elements. If we restrict ourselves to this class estimators, then it is possible to find a uniformly best estimator, and this best estimator dominates the sample covariance matrix in terms of risk.

Using the notions of equivariance,
in Gerard and Hoff [2015] we
extended the results of James and Stein [1961] to the multilinear
normal model. Unlike for the multivariate normal model, there is no
simple characterization for the class of equivariant estimators in the
multilinear normal model (beyond the definition of equivariance, of
course). So we had to develop a generalized Bayes procedure to find
the best equivariant estimator. This also required the derivation of a
novel, yet simple, class of distributions over the space of symmetric
and positive definite matrices which we called the "mirror-Wishart"
distribution. The form of the estimator depends on the loss function
we choose, but we developed a class of loss functions which are
natural extensions of Stein's loss from the vector case to the tensor
case. The best equivariant estimator using these "multiway Stein"
losses have simple characterizations in terms of the posterior
expectations of the component precision matrices. These estimators are
also minimax. And unlike the estimator in James and Stein [1961],
where the reductions in risk are modest, we observed risks as low as
\(60\%\) that of the MLE, depending on the dimension of the
tensor. This is *uniform* reduction over the whole parameter
space.

Allen, G. I. (2011). Comment on article by Hoff. *Bayesian Anal*. 6(2), p. 197-201 doi:10.1214/11-BA606A

**Gerard, D.**, & Hoff, P. (2015). Equivariant minimax dominators of the MLE in the array normal model. *Journal of multivariate analysis*, 137, p. 32-49. doi:10.1016/j.jmva.2015.01.020 | arXiv:1408.0424

He, S., Yin, J., Li, H., & Wang, X. (2014). Graphical model selection and estimation for high dimensional tensor data. *Journal of Multivariate Analysis*, 128, p. 165-185. doi:10.1016/j.jmva.2014.03.007

Hoff, P. D. (2011). Separable covariance arrays via the Tucker product, with applications to multivariate relational data. *Bayesian Anal.*, 6(2), p. 179–196. doi:10.1214/11-BA606

W. James and Charles Stein. Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I, pages 361–379. Univ. California Press, Berkeley, Calif., 1961. [link]

Shang P. Lin and Michael D. Perlman. A Monte Carlo comparison of four estimators of a covariance matrix. In Multivariate analysis VI (Pittsburgh, Pa., 1983), pages 411–429. North-Holland, Amsterdam, 1985. [link]

Manceur, A. M., & Dutilleul, P. (2013). Maximum likelihood estimation for the tensor normal distribution: Algorithm, minimum sample size, and empirical bias and dispersion. *Journal of Computational and Applied Mathematics*, 239, p. 37-49. doi:10.1016/j.cam.2012.09.017

Ohlson, M., Ahmad, M. R., & Von Rosen, D. (2013). The multilinear normal distribution: Introduction and some basic properties. *Journal of Multivariate Analysis*, 113, p. 37-47. doi:10.1016/j.jmva.2011.05.015