next up previous contents
Next: Exact posterior for hyperparameters Up: Adapting prior covariances Previous: Invariant determinants   Contents

Regularization parameters

Next we consider the example ${{\bf K}}(\gamma)$ = $\gamma {{\bf K}}_0$ where $\theta \ge 0$ has been denoted $\gamma $, representing a regularization parameter or an inverse temperature variable for the specific prior. For a $d$-dimensional Gaussian integral the normalization factor becomes $Z_\phi (\gamma)$ = $(\frac{2\pi}{\gamma})^\frac{d}{2}(\det {{\bf K}}_0)^{-1/2}$. For positive (semi-)definite ${{\bf K}}$ the dimension $d$ is given by the rank of ${{\bf K}}$ under a chosen discretization. Skipping constants results in a normalization energy $E_N(\gamma)$ = $-\frac{d}{2}\ln \gamma$. With

\begin{displaymath}
\frac{\partial {{\bf K}}}
{\partial \gamma}
=
{{\bf K}}_0
\end{displaymath} (482)

we obtain the stationarity equations
$\displaystyle \gamma {{\bf K}}_0(\phi-t)$ $\textstyle =$ $\displaystyle {\bf P}^\prime(\phi) {\bf P}^{-1}(\phi) N
- {\bf P}^\prime (\phi) \Lambda_X
,$ (483)
$\displaystyle \frac{1}{2}
(\phi-t ,\,{{\bf K}}_0\,(\phi-t))$ $\textstyle =$ $\displaystyle \frac{d}{2\, \gamma}
-E_\gamma^\prime
.$ (484)

For compensating hyperprior the right hand side of Eq. (484) vanishes, giving thus no stationary point for $\gamma $. Using however the condition $\gamma\ge 0$ one sees that for positive definite ${{\bf K}}_0$ Eq. (483) is minimized for $\gamma $ = $0$ corresponding to the `prior-free' case. For example, in the case of Gaussian regression the solution would be the data template $\phi $ = $h$ = $t_D$. This is also known as ``$\delta$-catastrophe''. To get a nontrivial solution for $\gamma $ a noncompensating hyperparameter energy $E_\gamma$ = $E_\theta$ must be used so that $\ln Z_\phi + E_N$ is nonuniform [16,24].

The other limiting case is a vanishing $E_\gamma^\prime$ for which Eq. (484) becomes

\begin{displaymath}
\gamma = \frac{d}{ (\phi-t ,\,{{\bf K}}_0\,(\phi-t)) }.
\end{displaymath} (485)

For $\phi\rightarrow t$ one sees that $\gamma\rightarrow \infty$. Moreover, in case $P[t]$ represents a normalized probability, $\phi=t$ is also a solution of the first stationarity equation (483) in the limit $\gamma\rightarrow \infty$. Thus, for vanishing $E_\gamma^\prime$ the `data-free' solution $\phi=t$ is a selfconsistent solution of the stationarity equations (483,484).

Fig.6 shows a posterior surface for uniform and for compensating hyperprior for a one-dimensional regression example. The Maximum A Posteriori Approximation corresponds to the highest point of the joint posterior over $\gamma $, $h$ in that figures. Alternatively one can treat the $\gamma $-integral by Monte-Carlo-methods [236].

Figure 6: Shown is the joint posterior density of $h$ and $\gamma $, i.e., $p({h},\gamma \vert D,D_0)$ $\propto p(y_D\vert{h})p({h}\vert\gamma ,D_0)p(\gamma )$ for a zero-dimensional example of Gaussian regression with training data $y_D=0$ and prior data $y_{D_0}=1$. L.h.s: For uniform prior $p(\gamma ) \propto 1$ so that the joint posterior becomes $p \propto e^{-\frac{1}{2} {h}^2 -\frac{\gamma}{2} ({h}-1)^2
+\frac{1}{2}\ln \gamma}$, having its maximum is at $\gamma $ = $\infty $, ${h}=1$. R.h.s.: For compensating hyperprior $p(\gamma ) \propto 1/\sqrt {\gamma }$ so that $p \propto e^{-\frac{1}{2} {h}^2 -\frac{\gamma}{2} ({h}-1)^2}$ having its maximum is at $\gamma $ = $0$, ${h}=0$.
\begin{figure}\begin{center}
\epsfig{file=ps/betaPU.eps, width= 65mm}\epsfig{file=ps/betaPC.eps, width= 65mm}\end{center}\end{figure}

Finally we remark that in the setting of empirical risk minimization, due to the different interpretation of the error functional, regularization parameters are usually determined by cross-validation or similar techniques [166,6,230,216,217,81,39,211,228,54,83].


next up previous contents
Next: Exact posterior for hyperparameters Up: Adapting prior covariances Previous: Invariant determinants   Contents
Joerg_Lemm 2001-01-21