next up previous contents
Next: Linearization and Newton algorithm Up: Learning matrices Previous: Learning matrices   Contents

Learning algorithms for density estimation

There exists a variety of well developed numerical methods for unconstraint as well as for constraint optimization [190,57,88,196,89,12,19,80,193]. Popular examples include conjugate gradient, Newton, and quasi-Newton methods, like the variable metric methods DFP (Davidon-Fletcher-Powell) or BFGS (Broyden-Fletcher-Goldfarb-Shanno).

All of them correspond to the choice of specific, often iteration dependent, learning matrices ${\bf A}$ defining the learning algorithm. Possible simple choices are:

$\displaystyle {\bf A} = {\bf I}$ $\textstyle :$ $\displaystyle {\rm Gradient}$ (627)
$\displaystyle {\bf A} = {\bf D}$ $\textstyle :$ $\displaystyle {\rm Jacobi}$ (628)
$\displaystyle {\bf A} = {\bf L} + {\bf D}$ $\textstyle :$ $\displaystyle \mbox{\rm Gauss--Seidel}$ (629)
$\displaystyle {\bf A} = {{\bf K}}$ $\textstyle :$ $\displaystyle \mbox{prior relaxation}$ (630)

where ${\bf I}$ stands for the identity operator, ${\bf D}$ for a diagonal matrix, e.g. the diagonal part of ${{\bf K}}$, and ${\bf L}$ for a lower triangular matrix, e.g. the lower triangular part of ${{\bf K}}$. In case ${{\bf K}}$ represents the operator of the prior term in an error functional we will call iteration with ${{\bf K}}^{-1}$ (corresponding to the covariance of the prior process) prior relaxation. For $\phi $-independent ${{\bf K}}$ and $T$, $\eta=1$ with invertible ${{\bf K}}$ the corresponding linear equation is solved by prior relaxation in one step. However, also linear equations are solved by iteration if the size of ${{\bf K}}$ is too large to be inverted. Because of ${\bf I}^{-1}$ = ${\bf I}$ the gradient algorithm does not require inversion.

On one hand, density estimation is a rather general problem requiring the solution of constraint, inhomogeneous, nonlinear (integro-)differential equations. On the other hand, density estimation problems are, at least for Gaussian specific priors and non restricting parameterization, typically ``nearly'' linear and have only a relatively simple non-negativity and normalization constraint. Furthermore, the inhomogeneities are commonly restricted to a finite number of discrete training data points. Thus, we expect the inversion of ${{\bf K}}$ to be the essential part of the solution for density estimation problems. However, ${{\bf K}}$ is not necessarily invertible or may be difficult to calculate. Also, inversion of ${{\bf K}}$ is not exactly what is optimal and there are improved methods. Thus, we will discuss in the following basic optimization methods adapted especially to the situation of density estimation.


next up previous contents
Next: Linearization and Newton algorithm Up: Learning matrices Previous: Learning matrices   Contents
Joerg_Lemm 2001-01-21