Kernels for

Next: Numerical examples Up: Initial configurations and kernel Previous: Kernels for Contents

Kernels for

For the truncated equation

$\begin{displaymath} P = {{\bf C}}{\bf P}^{-1}N , \end{displaymath}$

(696)

is still nonlinear in

. If we solve this equation approximately by a one-step iteration

= ${{\bf C}}({\bf P}^{(0)})^{-1}N$ starting from a uniform initial $P^{(0)}$ and normalizing afterwards this corresponds for a single

-value to the classical kernel methods commonly used in density estimation. As normalized density results

$\begin{displaymath} P(x,y) = \frac{\sum_i {{\bf C}} (x,y ;x_i,y_i)} {\int \! dy... ...(x,y^\prime ;x_i,y_i)} = \sum_i \bar {{\bf C}} (x,y ;x_i,y_i), \end{displaymath}$

(697)

i.e.,

$\begin{displaymath} P = {\bf N}_{K,X}^{-1} {{\bf C}} N = \bar {{\bf C}} N, \end{displaymath}$

(698)

with (data dependent) normalized kernel $\bar {{\bf C}}$ = ${\bf N}_{{C},X}^{-1} {{\bf C}}$ and ${\bf N}_{{C},X}$ the diagonal matrix with diagonal elements ${\bf I}_X {{\bf C}} N$ . Again ${{\bf C}} = {{\bf K}}^{-1}$ or similar invertible choices can be used to obtain a starting guess for

. The form of the Hessian (182) suggests in particular to include a mass term on the data.

It would be interesting to interpret Eq. (698) as stationarity equation of a functional $\hat E_P$ containing the usual data term $\sum_i \ln P(x_i,y_i)$ . Therefore, to obtain the derivative ${\bf P}^{-1} N$ of this data term we multiply for existing $\bar {\bf C}^{-1}$ Eq. (698) by ${\bf P}^{-1} \bar {{\bf C}}^{-1}$ , where $P\ne 0$ at data points, to obtain

$\begin{displaymath} {\widetilde {{\bf C}}^{-1} P} = {\bf P}^{-1}N , \end{displaymath}$

(699)

with data dependent

$\begin{displaymath} {\widetilde {{\bf C}}^{-1}} (x,y;x^\prime ,y^\prime) =\frac{... ...\prime ,y^\prime) } {\sum_i {\bar {{\bf C}}} (x,y;x_i ,y_i) }. \end{displaymath}$

(700)

Thus, Eq. (698) is the stationarity equation of the functional

$\begin{displaymath} \hat E_P = -(\,N,\,\ln P\,) +\frac{1}{2}\, (\,P,\, {\widetilde {{\bf C}}^{-1}} \, P \,) . \end{displaymath}$

(701)

To study the dependence on the number of training data for a given ${{\bf C}}$ consider a normalized kernel with $\int\! dy \, {{\bf C}} (x,y;x^\prime,y^\prime ) = \lambda$ , $\forall x,x^\prime ,y^\prime$ . For such a kernel the denominator of $\bar {{\bf C}}$ is equal to $n\lambda$ so we have

$\begin{displaymath} \bar {{\bf C}} = \frac{{{\bf C}}}{n\lambda} ,\quad P= \frac{{{\bf C}} N}{n \lambda } \end{displaymath}$

(702)

Assuming that for large

the empirical average $(1/n)\sum_i {{\bf C}}(x,y;x_i ,y_i)$ in the denominator of $\widetilde {{\bf C}}^{-1}$ becomes

independent, e.g., converging to the true average $n \!\int \!\!dx^\prime dy^\prime \, p(x^\prime,y^\prime) {{\bf C}}(x,y;x^\prime ,y^\prime)$ , the regularizing term in functional (701) becomes proportional to

$\begin{displaymath} {\widetilde {{\bf C}}^{-1}} \propto n \lambda^2 , \end{displaymath}$

(703)

According to Eq. (76) this would allow to relate a saddle point approximation to a large

-limit.

Again, a similar possibility is to start with the empirical density $\tilde P_{\rm emp}$ defined in Eq. (238). Analogously to Eq. (686), the empirical density can for example also be smoothed and correctly normalized again, so that

$\begin{displaymath} P^{(0)} = \tilde {\bf C} \tilde P_{\rm emp} . \end{displaymath}$

(704)

with $\tilde {\bf C}$ defined in Eq. (698).

Fig. 13 compares the initialization according to Eq. (697), where the smoothing operator $\tilde C$ acts on , with an initialization according to Eq. (704), where the smoothing operator $\tilde C$ acts on the correctly normalized $\tilde P_{\rm emp}$ .

**Figure 13:** Comparison of initial guesses $P^{(0)}(x,y)$ for a case with two data points located at and within the intervals $y\in [1,15]$ and $x\in [1,10]$ with periodic boundary conditions. First row: $P^{(0)}$ = $\tilde {\bf C} N$ . (The smoothing operator acts on the unnormalized . The following conditional normalization changes the shape more drastically than in the example shown in the second row.) Second row: $P^{(0)}$ = $\tilde {\bf C}\tilde P_{\rm emp}$ . (The smoothing operator acts on the already conditionally normalized $\tilde P_{\rm emp}$ .) The kernel $\tilde {\bf C}$ is given by Eq. (698) with ${\bf C}$ = $({\bf K}+m_C^2{\bf I})$ , = , and a ${\bf K}$ of the form of Eq. (705) with $\lambda _0$ = $\lambda _4$ = $\lambda _6$ = 0, and $\lambda _2$ = (figures on the l.h.s.) or $\lambda _2$ = (figures on the r.h.s.), respectively.
$\begin{figure}\begin{center} \epsfig{file=ps/densI1.eps, width= 65mm}%%(* kernel... ...nsI4.eps, width= 65mm}%%(cond-P-kernel init lamO=1.0) \end{center}\end{figure}$

Next: Numerical examples Up: Initial configurations and kernel Previous: Kernels for Contents

Joerg_Lemm 2001-01-21