next up previous contents
Next: Normalization by parameterization: Error Up: Gaussian prior factor for Previous: Gaussian prior factor for   Contents


Lagrange multipliers: Error functional $E_P$

We write $P(x,y)=p(y\vert x,{h})$ for the probability of $y$ conditioned on $x$ and ${h}$. We consider now a regularizing term which is quadratic in $P$ instead of $L$. This corresponds to a factor within the posterior probability (the specific prior) which is Gaussian with respect to $P$.

\begin{displaymath}
p(\!{h}\vert f) \!\propto \!\!
e^{
\sum_i \!\ln P_i(x_i,y_i)...
...ambda_X (x) \left( 1 - \int\!dy\,P(x,y) \right)
+ \tilde c,
}
\end{displaymath} (162)

or written in terms of $L=\ln P$ for comparison,
\begin{displaymath}
p(\!{h}\vert f) \!
\propto \!\!
e^{
\sum_i \!L_i(x_i,y_i)
-...
...a_X (x) \left( 1 - \int\!dy\,e^{L(x,y)} \right)
+ \tilde c.
}
\end{displaymath} (163)

Hence, the error functional is
\begin{displaymath}
E_P=\beta E_{\rm comb}=-(\ln P,N) + \frac{1}{2} (P,{{\bf K}}\,P)
+ (\,P-\delta(y)\,, \Lambda_X ).
\end{displaymath} (164)

In particular, the choice ${{\bf K}}$ = $\frac{\lambda}{2}{\bf I}$, i.e.,
\begin{displaymath}
\frac{\lambda}{2}(P,\,P) = \frac{\lambda}{2}\vert\vert P\vert\vert^2,
\end{displaymath} (165)

can be interpreted as a smoothness prior with respect to the distribution function of $P$ (see Section 3.3).

In functional (164) we have only implemented the normalization condition for $P$ by a Lagrange multiplier and not the non-negativity constraint. This is sufficient if $P(x,y)>0$ (i.e., $P(x,y)$ not equal zero) at the stationary point because then $P(x,y)>0$ holds also in some neighborhood and there are no components of the gradient pointing into regions with negative probabilities. In that case the non-negativity constraint is not active at the stationarity point. As probabilities have to be positive at data points, smoothness constraints result for example typically in positive probabilities everywhere where not set to zero explicitly by boundary conditions. If, however, the stationary point has locations with $P(x,y)$ = $0$ at non-boundary points, then the component of the gradient pointing in the region with negative probabilities has to be projected out by introducing Lagrange parameters for each $P(x,y)$. This may happen, for example, if the regularizer rewards oscillatory behavior.

The stationarity equation for $E_P$ is

\begin{displaymath}
0 = {\bf P}^{-1} N -{{\bf K}} P - \Lambda_X
,
\end{displaymath} (166)

with the diagonal matrix ${\bf P} (x^\prime,y^\prime;x,y)$ = $\delta (x-x^\prime)\delta (y-y^\prime) P(x,y)$, or multiplied by ${\bf P}$
\begin{displaymath}
0 = N - {\bf P}{{\bf K}} P - {\bf P} \Lambda_X
.
\end{displaymath} (167)

Probabilities $P(x,y)$ are unequal zero at observed data points $(x_i,y_i)$ so ${\bf P}^{-1} N$ is well defined.

Combining the normalization condition Eq. (135) for $\Lambda_X(x) \ne 0$ with Eq. (166) or (167) the Lagrange multiplier function $\Lambda_X$ is found as

\begin{displaymath}
\Lambda_X
= {\bf I}_X \left( N - {\bf P}{{\bf K}} P \right)
= N_X - {\bf I}_X {\bf P}{{\bf K}} P,
\end{displaymath} (168)

where

\begin{displaymath}
{\bf I}_X {\bf P}{{\bf K}}P (x,y)
= \int \!dy^\prime dx^{\pr...
...rime} ,y^{\prime\prime})
P(x^{\prime\prime},y^{\prime\prime}).
\end{displaymath}

Eliminating $\Lambda_X$ in Eq. (166) by using Eq. (168) gives finally
\begin{displaymath}
0 = ({\bf I} - {\bf I}_X {\bf P})
({\bf P}^{-1}N - {{\bf K}} P)
,
\end{displaymath} (169)

or for Eq. (167)
\begin{displaymath}
0 = ({\bf I} - {\bf P}{\bf I}_X)
(N - {\bf P}{{\bf K}} P)
.
\end{displaymath} (170)

For similar reasons as has been discussed for Eq. (141) unnormalized solutions fulfilling $N-{\bf P}{{\bf K}} P$ are possible. Defining
\begin{displaymath}
T_P = {\bf P}^{-1} N - \Lambda_X
= {\bf P}^{-1} N- N_X - {\bf I}_X {\bf P}{{\bf K}} P ,
\end{displaymath} (171)

the stationarity equation can be written analogously to Eq. (143) as
\begin{displaymath}
{{\bf K}} P = T_P,
\end{displaymath} (172)

with $T_P = T_P(P)$, suggesting for existing ${{\bf K}}^{-1}$ an iteration
\begin{displaymath}
P^{i+1} = {{\bf K}}^{-1} T_P(P^i)
,
\end{displaymath} (173)

starting from some initial guess $P^0$.


next up previous contents
Next: Normalization by parameterization: Error Up: Gaussian prior factor for Previous: Gaussian prior factor for   Contents
Joerg_Lemm 2001-01-21