next up previous contents
Next: Local mixtures Up: Prior mixtures for regression Previous: Equal covariances   Contents

Analytical solution of mixture models

For regression under a Gaussian mixture model the predictive density can be calculated analytically for fixed $\theta$. The predictive density can be expressed in terms of the likelihood of $\theta$ and $j$, marginalized over $h$,

\begin{displaymath}
p(y\vert x,D,D_0)
=
\sum_j \int \!d\theta\,
\frac{p(\theta,...
...,j) p(y_D\vert x_D,D_0,\theta,j)}
p(y\vert x,D,D_0,\theta,j)
.
\end{displaymath} (576)

(Here we concentrate on $\theta$. The parameter $\beta $ can be treated analogously.) According to Eq. (492) the likelihood can be written
\begin{displaymath}
p(y_D\vert x_D,D_0,\theta,j)
=
e^{-\beta \widetilde E_{0,j}(...
... \det (\frac{\beta}{2\pi} \widetilde {\bf K}_{j} (\theta))
}
,
\end{displaymath} (577)

with
\begin{displaymath}
\widetilde E_{0,j}(\theta)
=
\frac{1}{2}
\big( t_D - t_j(\...
...widetilde {\bf K}_j (\theta) (t_D - t_j(\theta ) )\big)
=V_j
,
\end{displaymath} (578)

and $\widetilde {\bf K}_{j}(\theta)$ = $({\bf K}_D^{-1}+{\bf K}_{j,DD}^{-1}(\theta))^{-1}$ being a $\tilde n\times\tilde n$-matrix in data space. The equality of $V_j$ and $\widetilde E_{0,j}$ can be seen using ${\bf K}_j-{\bf K}_j ({\bf K}_D + {\bf K}_j)^{-1}{\bf K}_j$ = ${\bf K}_D-{\bf K}_D ({\bf K}_D + {\bf K}_{j,DD})^{-1}{\bf K}_D$ = ${\bf K}_{j,DD}-{\bf K}_{j,DD }({\bf K}_D
+ {\bf K}_{j,DD)}^{-1}{\bf K}_{j,DD}$ = $\widetilde {\bf K}$. For the predictive mean, being the optimal solution under squared-error loss and log-loss (restricted to Gaussian densities with fixed variance) we find therefore
\begin{displaymath}
\bar y (x)
= \int\!dy\, y\, p(y\vert x,D,D_0)
= \sum_j
\int d\theta \; b_j(\theta )\, \bar t_j(\theta)
,
\end{displaymath} (579)

with, according to Eq. (327),
\begin{displaymath}
\bar t_j(\theta)
=
t_j + {\bf K}_j^{-1} \widetilde {\bf K}_j(t_D-t_j)
,
\end{displaymath} (580)

and mixture coefficients
$\displaystyle b_j(\theta)$ $\textstyle =$ $\displaystyle p(\theta,j\vert D)
=
\frac{p(\theta,j) p(y_D\vert x_D,D_0,\theta,j)}
{\sum_j\int d\theta p(\theta,j) p(y_D\vert x_D,D_0,\theta,j)}$  
  $\textstyle \propto$ $\displaystyle e^{-\beta \widetilde E_j (\theta) -E_{\theta,j}
+\frac{1}{2} \ln \det (\widetilde K_j (\theta))}
,$ (581)

which defines $\widetilde E_j$ = $ \beta \widetilde E_{0,j}$ + $E_{\theta,j}$. For solvable $\theta$-integral the coefficients can therefore be obtained exactly.

If $b_j$ is calculated in saddle point approximation at $\theta$ = $\theta^*$ it has the structure of $a_j$ in (552) with $E_{0,j}$ replaced by $\widetilde E_j$ and ${\bf K}_j$ by $\widetilde {\bf K}_j$. (The inverse temperature $\beta $ could be treated analogously to $\theta$. In that case $E_{\theta,j}$ would have to be replaced by $E_{\theta,\beta,j}$.)

Calculating also the likelihood for $j$, $\theta$ in Eq. (581) in saddle point approximation, i.e., $p(y_D\vert x_D,D_0,\theta^*,j)
\approx
p(y_D\vert x_D,h^*) p(h^*\vert D_0,\theta^*,j)$, the terms $p(y_D\vert x_D,h^*)$ in numerator and denominator cancel, so that, skipping $D_0$ and $\beta $,

\begin{displaymath}
b_j(\theta^*)
= \frac{p(h^*\vert j,\theta^*)p(j,\theta^*)}{p(h^*,\theta^*)}
= a_j(h^*,\theta^*)
,
\end{displaymath} (582)

becomes equal to the $a_j(\theta^*)$ in Eq. (552) at $h$ = $h^*$.

Eq. (581) yields as stationarity equation for $\theta$, similarly to Eq. (494)

$\displaystyle 0$ $\textstyle =$ $\displaystyle \sum_j b_j
\left( \frac{\partial \widetilde E_j}{\partial \theta}...
...{j}^{-1}
\frac{\partial \widetilde {\bf K}_{j}}{\partial \theta}\right)
\right)$ (583)
  $\textstyle =$ $\displaystyle \sum_j b_j
\Bigg(
\left( \frac{\partial t_j(\theta)}{\partial \theta},\;
\widetilde {\bf K}_{j}(\theta)(t_j(\theta)-t_D)\right)\ $  
    $\displaystyle +\frac{1}{2} \left((t_D-t_j(\theta)),\,
\frac{\partial \widetilde {\bf K}_{j}(\theta)}
{\partial \theta}(t_D-t_j(\theta))\right)$  
    $\displaystyle -{\rm Tr}\left(\widetilde {\bf K}_{j}^{-1}(\theta)
\frac{\partial...
...t)
-\frac{1}{p(\theta,j)} \frac{\partial p(\theta,j)}{\partial \theta}
\Bigg)
.$ (584)

For fixed $\theta$ and $j$-independent covariances the high temperature solution is a mixture of component solutions weighted by their prior probability

\begin{displaymath}
\bar y \stackrel{\beta\rightarrow 0}{\longrightarrow}
\sum_j p(j)\;\bar t_{j}
= \sum_j a_j^0 \;\bar t_{j}
= \bar t
.
\end{displaymath} (585)

The low temperature solution becomes the component solution $\bar t_j$ with minimal distance between data and prior template
\begin{displaymath}
\bar y \stackrel{\beta\rightarrow \infty}{\longrightarrow} \...
...big( t_D - t_j,\,
\widetilde {\bf K}_{j} (t_D - t_j )\big)
.
\end{displaymath} (586)

Fig.11 compares the exact mixture coefficient $b_1$ with the dominant solution of the maximum posterior coefficient $a_1$ (see also [132]) which are related according to (569)
\begin{displaymath}
a_j= \frac{e^{-\frac{\beta}{2}a {B}_j a-\widetilde E_j}}
{\...
...2}a {B}_j a}}
{\sum_k b_k\, e^{-\frac{\beta}{2} a {B}_k a}}
.
\end{displaymath} (587)

Figure 11: Exact $b_1$ and $a_1$ (dashed) vs. $\beta $ for two mixture components with equal covariances and $B_1(2,2)$ = $b$ = 2, $\widetilde E_1$ = 0.405, $\widetilde E_2$ = 0.605.
\begin{figure}\begin{center}
\epsfig{file=ps/cmp.eps, width=80mm}\end{center}\vspace{-0.7cm}
\end{figure}


next up previous contents
Next: Local mixtures Up: Prior mixtures for regression Previous: Equal covariances   Contents
Joerg_Lemm 2001-01-21