Regression

Consider now the case of regression according to functional (247) with an adaptive template $t_0(\theta)$ . The system of stationarity equations for the regression function

(corresponding to $\phi(x,y)$ ) and $\theta$ becomes

$\displaystyle {{\bf K}}_0({h}-t_0)$	$\textstyle =$	$\displaystyle {{\bf K}_D}(t_D-{h}) ,$	(444)
$\displaystyle {\bf t}_0^\prime {{\bf K}}_0 ({h} -t_0)$	$\textstyle =$	$\displaystyle 0 .$	(445)

$\begin{displaymath} 0 = {\bf t}_0^\prime {{\bf K}_D}({h}-t_D) . \end{displaymath}$

(446)

$\begin{displaymath} {h} = t = \left({{\bf K}}_0 + {{\bf K}}_D\right)^{-1} \left({{\bf K}}_0 t_0 + {{\bf K}}_D t_D\right) , \end{displaymath}$

(447)

$\begin{displaymath} 0={\bf t}_0^\prime {{\bf K}}_0 (t-t_0) , \end{displaymath}$

(448)

$\begin{displaymath} 0 = {\bf t}_0^\prime {{\bf K}}_D (t-t_D) . \end{displaymath}$

(449)

$\begin{displaymath} 0 = {\bf t}_0^\prime {{\bf K}}_D (t_0-t_D) , \end{displaymath}$

(450)

$\begin{displaymath} 0 = {{\bf H}}^\prime {{\bf K}}_D ({h}-t_D), \end{displaymath}$

(451)

$\begin{displaymath} E_{D,{h}(\xi)} = \frac {1}{2} (\,{h}(\xi) - t_D,\, {{\bf K}}_D ({h}(\xi)-t_D)\,) \end{displaymath}$

(452)

$\begin{displaymath} \lim_{\lambda\rightarrow \infty} {h} = \lim_{\lambda\rightarrow \infty} t = t_0 , \end{displaymath}$

(453)

For comparison with Eqs.(449,450,451) we give the stationarity equations for parameters $\xi$ for a parameterized regression functional including an additional prior term with hyperparameters

$\begin{displaymath} E_{\theta,{h}(\xi)} = \frac {1}{2} (\,{h}(\xi) - t_D,\, {{\... ..._0(\theta),\, {{\bf K}}_0(\theta) ({h}(\xi)-t_0(\theta))\,) , \end{displaymath}$

(454)

$\begin{displaymath} 0 = {{\bf H}}^\prime {{\bf K}}_D ({h}-t_D) + {{\bf H}}^\prime {{\bf K}}_0 ({h}-t_0) . \end{displaymath}$

(455)

Let us now compare the various regression functionals we have met up to now. The non-parameterized and regularized regression functional $E_{{h}}$ (247) implements prior information explicitly by a regularization term.

A parameterized and regularized functional $E_{{h}(\xi)}$ of the form (353) corresponds to a functional of the form (454) for $\theta$ fixed. It imposes restrictions on the regression function

in two ways, by choosing a specific parameterization and by including an explicit prior term. If the number of data is large enough, compared to the flexibility of the parameterization, the data term of $E_{{h}(\xi)}$ alone can have a unique minimum. Then, at least technically, no additional prior term would be required. This corresponds to the classical error minimization methods used typically for parametric approaches. Nevertheless, also in such situations the explicit prior term can be useful if it implements useful prior knowledge over

The regularized functional with prior- or hyperparameters $E_{\theta,{h}}$ (432) implements, compared to $E_{{h}}$ , effectively weaker prior restrictions. The prior term corresponds to a soft restriction of

to the space spanned by the parameterized $t(\theta)$ . In the limit where the parameterization of $t(\theta)$ is rich enough to allow $t(\theta^*)$ = ${h}^*$ at the stationary point the prior term vanishes completely.

The parameterized and regularized functional $E_{\theta,{h}(\xi)}$ (454), including prior parameters $\theta$ , implements prior information explicitly by a regularization term and implicitly by the parameterization of ${h}(\xi)$ . The explicit prior term vanishes if $t(\theta^*)$ = ${h} (\xi^*)$ at the stationary point. The functional combines a hard restriction of

with respect to the space spanned by the parameterization ${h}(\xi)$ and a soft restriction of

with respect to the space spanned by the parameterized $t(\theta)$ . Finally, the parameterized and non-regularized functional $E_{D,{h}(\xi)}$ (452) implements prior information only implicitly by parameterizing ${h}(\xi)$ . In contrast to the functionals $E_{\theta,{h}}$ and $E_{\theta,{h}(\xi)}$ it implements only a hard restriction for

. The following table summarizes the discussion:

Functional	Eq.	prior implemented
$E_{{h}}$	(247)	explicitly
$E_{{h}(\xi)}$	(353)	explicitly and implicitly
$E_{\theta,{h}}$	(432)	explicitly
		no prior for $t(\theta^) = {h}^$
$E_{\theta,{h}(\xi)}$	(454)	explicitly and implicitly
		no expl. prior for $t(\theta^) = {h} (\xi^)$
$E_{D,{h}(\xi)}$	(452)	implicitly