next up previous contents
Next: Automatic relevance detection Up: Adapting prior covariances Previous: Adapting prior covariances   Contents

General case

Parameterizing covariances ${{\bf K}}^{-1}$ is often desirable in practice. It includes for example adapting the trade-off between data and prior terms (i.e., the determination of the regularization factor), the selection between different symmetries, smoothness measures, or in the multidimensional situation the determination of directions with low variance. As far as the normalization depends on ${{\bf K}}(\theta)$ one has to consider the error functional

\begin{displaymath}
E_{\theta,\phi} =
-(\ln P(\phi),\,N)
+\frac{1}{2} \Big(\phi-...
...ig)
+ (P(\phi),\, \Lambda_X )
+\ln Z_\phi (\theta)
+E_\theta
,
\end{displaymath} (456)

with
\begin{displaymath}
Z_\phi (\theta) =
(2\pi)^\frac{d}{2}(\det {{\bf K}}(\theta))^{-\frac{1}{2}}
,
\end{displaymath} (457)

for a $d$-dimensional Gaussian specific prior, and stationarity equations
$\displaystyle {{\bf K}}(\phi-t)$ $\textstyle =$ $\displaystyle {\bf P}^\prime(\phi) {\bf P}^{-1}(\phi) N
- {\bf P}^\prime (\phi) \Lambda_X
,$ (458)
$\displaystyle \frac{1}{2}
\Big(\phi-t ,\,\frac{\partial {{\bf K}}(\theta)}{\partial \theta}\,(\phi-t)\Big)$ $\textstyle =$ $\displaystyle \frac{1}{2}
{\rm Tr} \left({\bf K}^{-1}(\theta)
\frac{\partial {{\bf K}}(\theta)}{\partial \theta}\right)
-E_\theta^\prime
.$ (459)

Here we used
\begin{displaymath}
\frac{\partial}{\partial \theta} \ln \det {{\bf K}}
=\frac{\...
... K}}^{-1} \frac{\partial {{\bf K}}}{\partial \theta} \right)
.
\end{displaymath} (460)

In case of an unrestricted variation of the matrix elements of ${{\bf K}}$ the hyperparameters become $\theta_l$ = $\theta (x,y;x^\prime,y^\prime)$ = ${{\bf K}} (x,y;x^\prime,y^\prime)$. Then, using
\begin{displaymath}
\frac{\partial {{\bf K}}(x,y;x^\prime,y^\prime)}
{\partial ...
...me\prime\prime})
\delta (y^{\prime}-y^{\prime\prime\prime})
,
\end{displaymath} (461)

Eqs.(459) becomes the inhomogeneous equation
\begin{displaymath}
\frac{1}{2} (\phi-t) \, (\phi-t)^T
=
{\rm Tr} \left({\bf K}...
... {{\bf K}}(\theta)}{\partial \theta}\right)
-E_\theta^\prime
.
\end{displaymath} (462)

We will in the sequel consider the two special cases where the determinant of the covariance is $\theta$-independent so that the trace term vanishes, and where $\theta$ is just a multiplicative factor for the specific prior energy, i.e., a so called regularization parameter.


next up previous contents
Next: Automatic relevance detection Up: Adapting prior covariances Previous: Adapting prior covariances   Contents
Joerg_Lemm 2001-01-21