next up previous
Next: Conclusion Up: fns98 Previous: 3 Combining quadratic concepts

4 Iteration procedures = learning algorithms

The stationarity equations of both models are in general inhomogeneous integral equations. Similar equations appear for example in quantum mechanical scattering theory, where the inhomogeneities, in analogy to templates or data, represent the measurable asymptotic states (``channels'') of the system [9]. As nonlinear equations they have to be solved by iteration. An iteration procedure (``learning algorithm'') for solving a (linear or nonlinear) equation of the form $O h = t$ is obtained by selecting a possibly iteration step $k$ and $h^k$-dependent operator $A$ and relaxation factor $\eta$ and using the updating rule $ h^{k+1}
= h^{k} + \eta A^{-1} (t^k - O^k h^k)$. For any positive definite $A$ the error decreases till reaching a local minimum provided $\eta$ is chosen small enough. An $A$ equal to the identity, for example, corresponds to the gradient algorithm and requires no inversion. A Gaussian $A^{-1}$ can approximate local correlations induced by differential operators. Choosing $A=O^k$ corresponds for error functional $E^M$ to the EM algorithm and selecting the negative Hessian results in the Newton method.

Figure (1) summarizes the temperature dependence which have been found in numerical studies of a model with error $E^M_{(2)}$ = $e^{-\frac{\beta}{2} (d_D^2+d_1^2)}
+e^{-\frac{\beta}{2} (d_D^2+d_2^2)}$. This corresponds to a data template $d_D$ AND-ed to a probabilistic OR of two continuous prior templates $d_i^2$. Here the data template $d_D^2$ is a sum of standard mean square error terms and the prior concepts $d^2_i = -<\!h-t_i\vert\Delta\vert h-t_i\!>$ measure the (``Laplace'') distance between $h(x)$ and prior template $t_i(x)$. The actual data and the two continuous prior templates $t_i$ are displayed on the right hand side of the figure.

Figure 1: Scheme of temperature (=$\beta ^{-1}$) dependence of minima $h$ of error functional $E^M_{(2)}[h]$ in a model with data template (=training data) and two continuous prior templates combined by OR and Laplace operator as smoothness distance. For low temperatures two local minima appear corresponding to one of the two continuous prior templates deformed by the training data. At high temperature all templates are effectively AND-ed and the solution is the total template average of all three templates.
\begin{figure}\vspace{1in} \raisebox{-7.5cm}[3.8cm][-0cm]{
\psadobepercent{60}{scheme2.ps}
}
\end{figure}



Subsections
next up previous
Next: Conclusion Up: fns98 Previous: 3 Combining quadratic concepts
Joerg_Lemm 2000-09-22