next up previous contents
Next: Kernels for Up: Initial configurations and kernel Previous: Initial configurations and kernel   Contents

Truncated equations

To solve the nonlinear Eq. (609) by iteration one has to begin with an initial configuration $\phi^{(0)}$. In principle any easy to use technique for density estimation could be chosen to construct starting guesses $\phi^{(0)}$.

One possibility to obtain initial guesses is to neglect some terms of the full stationarity equation and solve the resulting simpler (ideally linear) equation first. The corresponding solution may be taken as initial guess $\phi^{(0)}$ for solving the full equation.

Typical error functionals for statistical learning problems include a term $( L,\, N)$ consisting of a discrete sum over a finite number $n$ of training data. For diagonal ${\bf P}^\prime$ those contributions result (355) in $n$ $\delta$-peak contributions to the inhomogeneities $T$ of the stationarity equations, like $\sum_i \delta (x-x_i)\delta (y-y_i)$ in Eq. (143) or $\sum_i \delta (x-x_i)\delta (y-y_i)/P(x,y)$ in Eq. (172). To find an initial guess, one can now keep only that $\delta$-peak contributions $T_\delta$ arising from the training data and ignore the other, typically continuous parts of $T$. For (143) and (172) this means setting $\Lambda_X = 0$ and yields a truncated equation

\begin{displaymath}
{{\bf K}} \phi
= {\bf P}^\prime {\bf P}^{-1} N
= T_{\delta}
.
\end{displaymath} (680)

Hence, $\phi $ can for diagonal ${\bf P}^\prime$ be written as a sum of $n$ terms
\begin{displaymath}
\phi(x,y) = \sum_{i=1}^n {{\bf C}} (x,y;x_i ,y_i)
\frac{P^\prime (x_i,y_i)}{P(x_i,y_i)},
\end{displaymath} (681)

with ${{\bf C}} = {{\bf K}}^{-1}$, provided the inverse ${{\bf K}}^{-1}$ exists. For $E_L$ the resulting truncated equation is linear in $L$. For $E_P$, however, the truncated equations remains nonlinear. Having solved the truncated equation we restore the necessary constraints for $\phi $, like normalization and non-negativity for $P$ or normalization of the exponential for $L$.

In general, a ${{\bf C}}\ne {{\bf K}}^{-1}$ can be chosen. This is necessary if ${{\bf K}}$ is not invertible and can also be useful if its inverse is difficult to calculate. One possible choice for the kernel is the inverse negative Hessian ${{\bf C}} = - {\bf H}^{-1}$ evaluated at some initial configuration $\phi^{(0)}$ or an approximation of it. A simple possibility to construct an invertible operator from a noninvertible ${{\bf K}}$ would be to add a mass term

\begin{displaymath}
{{\bf C}}
=
\left( {{\bf K}} + m_C^2 {\bf I} \right)^{-1}
,
\end{displaymath} (682)

or to impose additional boundary conditions.

Solving a truncated equation of the form (681) with ${{\bf C}}$ means skipping the term $-{{\bf C}}({\bf P}^\prime \Lambda_X+({{\bf K}}-{{\bf C}}^{-1})\phi)$ from the exact relation

\begin{displaymath}
\phi = {{\bf C}} {\bf P}^\prime {\bf P}^{-1} N
-{{\bf C}}({\bf P}^\prime \Lambda_X+({{\bf K}}-{{\bf C}}^{-1})\phi)
.
\end{displaymath} (683)

A kernel used to create an initial guess $\phi^{(0)}$ will be called an initializing kernel.

A similar possibility is to start with an ``empirical solution''

\begin{displaymath}
\phi^{(0)} = \phi_{\rm emp},
\end{displaymath} (684)

where $\phi_{\rm emp}$ is defined as a $\phi $ which reproduces the conditional empirical density $P_{\rm emp}$ of Eq. (236) obtained from the training data, i.e.,
\begin{displaymath}
P_{\rm emp} = P (\phi_{\rm emp}).
\end{displaymath} (685)

In case, there are not data points for every $x$-value, a correctly normalized initial solution would for example be given by $\tilde P_{\rm emp}$ defined in Eq. (238). If zero values of the empirical density correspond to infinite values for $\phi $, like in the case $\phi $ = $L$, one can use $P^\epsilon_{\rm emp}$ as defined in Eq. (239), with small $\epsilon $, to obtain an initial guess.

Similarly to Eq. (681), it is often also useful to choose a (for example smoothing) kernel ${\bf C}$ and use as initial guess

\begin{displaymath}
\phi^{(0)} = {\bf C} \phi_{\rm emp}
,
\end{displaymath} (686)

or a properly normalized version thereof. Alternatively, one may also let the (smoothing) operator ${\bf C}$ directly act on $P_{\rm emp}$ and use a corresponding $\phi $ as initial guess,
\begin{displaymath}
\phi^{(0)} = (\phi)^{(-1)} ({\bf C} P_{\rm emp})
,
\end{displaymath} (687)

assuming an inverse $(\phi)^{(-1)}$ of the mapping $P(\phi)$ exists.

We will now discuss the cases $\phi=L$ and $\phi = P$ in some more detail.


next up previous contents
Next: Kernels for Up: Initial configurations and kernel Previous: Initial configurations and kernel   Contents
Joerg_Lemm 2001-01-21