next up previous contents
Next: Numerical examples Up: Initial configurations and kernel Previous: Kernels for   Contents

Kernels for $P$

For $E_P$ the truncated equation

\begin{displaymath}
P = {{\bf C}}{\bf P}^{-1}N
,
\end{displaymath} (696)

is still nonlinear in $P$. If we solve this equation approximately by a one-step iteration $P^1$ = ${{\bf C}}({\bf P}^{(0)})^{-1}N$ starting from a uniform initial $P^{(0)}$ and normalizing afterwards this corresponds for a single $x$-value to the classical kernel methods commonly used in density estimation. As normalized density results
\begin{displaymath}
P(x,y)
= \frac{\sum_i {{\bf C}} (x,y ;x_i,y_i)}
{\int \! dy...
...(x,y^\prime ;x_i,y_i)}
= \sum_i \bar {{\bf C}} (x,y ;x_i,y_i),
\end{displaymath} (697)

i.e.,
\begin{displaymath}
P
= {\bf N}_{K,X}^{-1} {{\bf C}} N
= \bar {{\bf C}} N,
\end{displaymath} (698)

with (data dependent) normalized kernel $\bar {{\bf C}}$ = ${\bf N}_{{C},X}^{-1} {{\bf C}}$ and ${\bf N}_{{C},X}$ the diagonal matrix with diagonal elements ${\bf I}_X {{\bf C}} N$. Again ${{\bf C}} = {{\bf K}}^{-1}$ or similar invertible choices can be used to obtain a starting guess for $P$. The form of the Hessian (182) suggests in particular to include a mass term on the data.

It would be interesting to interpret Eq. (698) as stationarity equation of a functional $\hat E_P$ containing the usual data term $\sum_i \ln P(x_i,y_i)$. Therefore, to obtain the derivative ${\bf P}^{-1} N$ of this data term we multiply for existing $\bar {\bf C}^{-1}$ Eq. (698) by ${\bf P}^{-1} \bar {{\bf C}}^{-1}$, where $P\ne 0$ at data points, to obtain

\begin{displaymath}
{\widetilde {{\bf C}}^{-1} P}
= {\bf P}^{-1}N
,
\end{displaymath} (699)

with data dependent
\begin{displaymath}
{\widetilde {{\bf C}}^{-1}} (x,y;x^\prime ,y^\prime)
=\frac{...
...\prime ,y^\prime) }
{\sum_i {\bar {{\bf C}}} (x,y;x_i ,y_i) }.
\end{displaymath} (700)

Thus, Eq. (698) is the stationarity equation of the functional
\begin{displaymath}
\hat E_P =
-(\,N,\,\ln P\,)
+\frac{1}{2}\,
(\,P,\, {\widetilde {{\bf C}}^{-1}} \, P \,)
.
\end{displaymath} (701)

To study the dependence on the number $n$ of training data for a given ${{\bf C}}$ consider a normalized kernel with $\int\! dy \, {{\bf C}} (x,y;x^\prime,y^\prime ) = \lambda$, $\forall x,x^\prime ,y^\prime$. For such a kernel the denominator of $\bar {{\bf C}}$ is equal to $n\lambda$ so we have

\begin{displaymath}
\bar {{\bf C}} = \frac{{{\bf C}}}{n\lambda}
,\quad
P= \frac{{{\bf C}} N}{n \lambda }
\end{displaymath} (702)

Assuming that for large $n$ the empirical average $(1/n)\sum_i {{\bf C}}(x,y;x_i ,y_i)$ in the denominator of $\widetilde {{\bf C}}^{-1}$ becomes $n$ independent, e.g., converging to the true average $n \!\int \!\!dx^\prime dy^\prime \, p(x^\prime,y^\prime)
{{\bf C}}(x,y;x^\prime ,y^\prime)$, the regularizing term in functional (701) becomes proportional to $n$
\begin{displaymath}
{\widetilde {{\bf C}}^{-1}} \propto n \lambda^2
,
\end{displaymath} (703)

According to Eq. (76) this would allow to relate a saddle point approximation to a large $n$-limit.

Again, a similar possibility is to start with the empirical density $\tilde P_{\rm emp}$ defined in Eq. (238). Analogously to Eq. (686), the empirical density can for example also be smoothed and correctly normalized again, so that

\begin{displaymath}
P^{(0)}
= \tilde {\bf C} \tilde P_{\rm emp}
.
\end{displaymath} (704)

with $\tilde {\bf C}$ defined in Eq. (698).

Fig. 13 compares the initialization according to Eq. (697), where the smoothing operator $\tilde C$ acts on $N$, with an initialization according to Eq. (704), where the smoothing operator $\tilde C$ acts on the correctly normalized $\tilde P_{\rm emp}$.

Figure 13: Comparison of initial guesses $P^{(0)}(x,y)$ for a case with two data points located at $(3,3)$ and $(7,12)$ within the intervals $y\in [1,15]$ and $x\in [1,10]$ with periodic boundary conditions. First row: $P^{(0)}$ = $\tilde {\bf C} N$. (The smoothing operator acts on the unnormalized $N$. The following conditional normalization changes the shape more drastically than in the example shown in the second row.) Second row: $P^{(0)}$ = $\tilde {\bf C}\tilde P_{\rm emp}$. (The smoothing operator acts on the already conditionally normalized $\tilde P_{\rm emp}$.) The kernel $\tilde {\bf C}$ is given by Eq. (698) with ${\bf C}$ = $({\bf K}+m_C^2{\bf I})$, $m_C^2$ = $1.0$, and a ${\bf K}$ of the form of Eq. (705) with $\lambda _0$ = $\lambda _4$ = $\lambda _6$ = 0, and $\lambda _2$ = $0.1$ (figures on the l.h.s.) or $\lambda _2$ = $1.0$ (figures on the r.h.s.), respectively.
\begin{figure}\begin{center}
\epsfig{file=ps/densI1.eps, width= 65mm}%%(* kernel...
...nsI4.eps, width= 65mm}%%(*cond-P-kernel init lamO=1.0*)
\end{center}\end{figure}


next up previous contents
Next: Numerical examples Up: Initial configurations and kernel Previous: Kernels for   Contents
Joerg_Lemm 2001-01-21