next up previous contents
Next: Adapting prior covariances Up: Adapting prior means Previous: Unrestricted variation   Contents

Regression

Consider now the case of regression according to functional (247) with an adaptive template $t_0(\theta)$. The system of stationarity equations for the regression function $h(x)$ (corresponding to $\phi(x,y)$) and $\theta$ becomes

$\displaystyle {{\bf K}}_0({h}-t_0)$ $\textstyle =$ $\displaystyle {{\bf K}_D}(t_D-{h})
,$ (444)
$\displaystyle {\bf t}_0^\prime {{\bf K}}_0 ({h} -t_0)$ $\textstyle =$ $\displaystyle 0
.$ (445)

It will also be useful to insert Eq. (444) in Eq. (445), yielding
\begin{displaymath}
0 = {\bf t}_0^\prime {{\bf K}_D}({h}-t_D)
.
\end{displaymath} (446)

For fixed $t$ Eq. (444) is solved by the template average $t$
\begin{displaymath}
{h} = t = \left({{\bf K}}_0 + {{\bf K}}_D\right)^{-1}
\left({{\bf K}}_0 t_0 + {{\bf K}}_D t_D\right)
,
\end{displaymath} (447)

so that Eq. (445) or Eq. (446), respectively, become
\begin{displaymath}
0={\bf t}_0^\prime {{\bf K}}_0
(t-t_0)
,
\end{displaymath} (448)


\begin{displaymath}
0 = {\bf t}_0^\prime {{\bf K}}_D (t-t_D)
.
\end{displaymath} (449)

It is now interesting to note that if we replace in Eq. (449) the full template average $t$ by $t_0$ we get
\begin{displaymath}
0 = {\bf t}_0^\prime {{\bf K}}_D (t_0-t_D)
,
\end{displaymath} (450)

which is equivalent to the stationarity equation
\begin{displaymath}
0 = {{\bf H}}^\prime {{\bf K}}_D ({h}-t_D),
\end{displaymath} (451)

(with derivative matrix ${{\bf H}}^\prime$ being the analogue to $\Phi^\prime$ for $h$) of an error functional
\begin{displaymath}
E_{D,{h}(\xi)}
= \frac {1}{2} (\,{h}(\xi) - t_D,\, {{\bf K}}_D ({h}(\xi)-t_D)\,)
\end{displaymath} (452)

without prior terms but with parameterized ${h}(\xi)$, e.g., a neural network. The approximation ${h}$ = $t$ = $t_0$ can, for example, be interpreted as limit $\lambda\rightarrow \infty$,
\begin{displaymath}
\lim_{\lambda\rightarrow \infty} {h} =
\lim_{\lambda\rightarrow \infty} t = t_0
,
\end{displaymath} (453)

after replacing ${{\bf K}}_0$ by $\lambda {{\bf K}}_0$ in Eq. (447). The setting ${h}$ = $t_0$ can then be used as initial guess ${h}^0$ for an iterative solution for ${h}$. For existing ${{\bf K}}_0^{-1}$ ${h}$ = $t_0$ is also obtained after one iteration step of the iteration scheme ${h}^{i} = t_0 + {{\bf K}}_0^{-1} {{\bf K}}_D (t_D-{h}^{i-1})$ starting with initial guess ${h}^0=t_D$.

For comparison with Eqs.(449,450,451) we give the stationarity equations for parameters $\xi$ for a parameterized regression functional including an additional prior term with hyperparameters

\begin{displaymath}
E_{\theta,{h}(\xi)} =
\frac {1}{2} (\,{h}(\xi) - t_D,\, {{\...
..._0(\theta),\, {{\bf K}}_0(\theta)
({h}(\xi)-t_0(\theta))\,)
,
\end{displaymath} (454)

which are
\begin{displaymath}
0 = {{\bf H}}^\prime {{\bf K}}_D ({h}-t_D)
+ {{\bf H}}^\prime {{\bf K}}_0 ({h}-t_0)
.
\end{displaymath} (455)

Let us now compare the various regression functionals we have met up to now. The non-parameterized and regularized regression functional $E_{{h}}$ (247) implements prior information explicitly by a regularization term.

A parameterized and regularized functional $E_{{h}(\xi)}$ of the form (353) corresponds to a functional of the form (454) for $\theta$ fixed. It imposes restrictions on the regression function $h$ in two ways, by choosing a specific parameterization and by including an explicit prior term. If the number of data is large enough, compared to the flexibility of the parameterization, the data term of $E_{{h}(\xi)}$ alone can have a unique minimum. Then, at least technically, no additional prior term would be required. This corresponds to the classical error minimization methods used typically for parametric approaches. Nevertheless, also in such situations the explicit prior term can be useful if it implements useful prior knowledge over $h$.

The regularized functional with prior- or hyperparameters $E_{\theta,{h}}$ (432) implements, compared to $E_{{h}}$, effectively weaker prior restrictions. The prior term corresponds to a soft restriction of ${h}$ to the space spanned by the parameterized $t(\theta)$. In the limit where the parameterization of $t(\theta)$ is rich enough to allow $t(\theta^*)$ = ${h}^*$ at the stationary point the prior term vanishes completely.

The parameterized and regularized functional $E_{\theta,{h}(\xi)}$ (454), including prior parameters $\theta$, implements prior information explicitly by a regularization term and implicitly by the parameterization of ${h}(\xi)$. The explicit prior term vanishes if $t(\theta^*)$ = ${h} (\xi^*)$ at the stationary point. The functional combines a hard restriction of ${h}$ with respect to the space spanned by the parameterization ${h}(\xi)$ and a soft restriction of ${h}$ with respect to the space spanned by the parameterized $t(\theta)$. Finally, the parameterized and non-regularized functional $E_{D,{h}(\xi)} $ (452) implements prior information only implicitly by parameterizing ${h}(\xi)$. In contrast to the functionals $E_{\theta,{h}}$ and $E_{\theta,{h}(\xi)}$ it implements only a hard restriction for ${h}$. The following table summarizes the discussion:

Functional Eq. prior implemented
$E_{{h}}$ (247) explicitly
$E_{{h}(\xi)}$ (353) explicitly and implicitly
$E_{\theta,{h}}$ (432) explicitly
    no prior for $t(\theta^*) = {h}^*$
$E_{\theta,{h}(\xi)}$ (454) explicitly and implicitly
    no expl. prior for $t(\theta^*) = {h} (\xi^*)$
$E_{D,{h}(\xi)} $ (452) implicitly


next up previous contents
Next: Adapting prior covariances Up: Adapting prior means Previous: Unrestricted variation   Contents
Joerg_Lemm 2001-01-21