Learning algorithms for density estimation

Next: Linearization and Newton algorithm Up: Learning matrices Previous: Learning matrices Contents

Learning algorithms for density estimation

There exists a variety of well developed numerical methods for unconstraint as well as for constraint optimization [190,57,88,196,89,12,19,80,193]. Popular examples include conjugate gradient, Newton, and quasi-Newton methods, like the variable metric methods DFP (Davidon-Fletcher-Powell) or BFGS (Broyden-Fletcher-Goldfarb-Shanno).

All of them correspond to the choice of specific, often iteration dependent, learning matrices ${\bf A}$ defining the learning algorithm. Possible simple choices are:

$\displaystyle {\bf A} = {\bf I}$	$\textstyle :$	$\displaystyle {\rm Gradient}$	(627)
$\displaystyle {\bf A} = {\bf D}$	$\textstyle :$	$\displaystyle {\rm Jacobi}$	(628)
$\displaystyle {\bf A} = {\bf L} + {\bf D}$	$\textstyle :$	$\displaystyle \mbox{\rm Gauss--Seidel}$	(629)
$\displaystyle {\bf A} = {{\bf K}}$	$\textstyle :$	$\displaystyle \mbox{prior relaxation}$	(630)

where ${\bf I}$ stands for the identity operator, ${\bf D}$ for a diagonal matrix, e.g. the diagonal part of ${{\bf K}}$ , and ${\bf L}$ for a lower triangular matrix, e.g. the lower triangular part of ${{\bf K}}$ . In case ${{\bf K}}$ represents the operator of the prior term in an error functional we will call iteration with ${{\bf K}}^{-1}$ (corresponding to the covariance of the prior process) prior relaxation. For $\phi$ -independent ${{\bf K}}$ and

, $\eta=1$ with invertible ${{\bf K}}$ the corresponding linear equation is solved by prior relaxation in one step. However, also linear equations are solved by iteration if the size of ${{\bf K}}$ is too large to be inverted. Because of ${\bf I}^{-1}$ = ${\bf I}$ the gradient algorithm does not require inversion.

On one hand, density estimation is a rather general problem requiring the solution of constraint, inhomogeneous, nonlinear (integro-)differential equations. On the other hand, density estimation problems are, at least for Gaussian specific priors and non restricting parameterization, typically ``nearly'' linear and have only a relatively simple non-negativity and normalization constraint. Furthermore, the inhomogeneities are commonly restricted to a finite number of discrete training data points. Thus, we expect the inversion of ${{\bf K}}$ to be the essential part of the solution for density estimation problems. However, ${{\bf K}}$ is not necessarily invertible or may be difficult to calculate. Also, inversion of ${{\bf K}}$ is not exactly what is optimal and there are improved methods. Thus, we will discuss in the following basic optimization methods adapted especially to the situation of density estimation.

Next: Linearization and Newton algorithm Up: Learning matrices Previous: Learning matrices Contents

Joerg_Lemm 2001-01-21