Analytical solution of mixture models

Next: Local mixtures Up: Prior mixtures for regression Previous: Equal covariances Contents

Analytical solution of mixture models

For regression under a Gaussian mixture model the predictive density can be calculated analytically for fixed $\theta$ . The predictive density can be expressed in terms of the likelihood of $\theta$ and , marginalized over ,

$\begin{displaymath} p(y\vert x,D,D_0) = \sum_j \int \!d\theta\, \frac{p(\theta,... ...,j) p(y_D\vert x_D,D_0,\theta,j)} p(y\vert x,D,D_0,\theta,j) . \end{displaymath}$

(576)

(Here we concentrate on $\theta$ . The parameter $\beta$ can be treated analogously.) According to Eq. (492) the likelihood can be written

$\begin{displaymath} p(y_D\vert x_D,D_0,\theta,j) = e^{-\beta \widetilde E_{0,j}(... ... \det (\frac{\beta}{2\pi} \widetilde {\bf K}_{j} (\theta)) } , \end{displaymath}$

(577)

with

$\begin{displaymath} \widetilde E_{0,j}(\theta) = \frac{1}{2} \big( t_D - t_j(\... ...widetilde {\bf K}_j (\theta) (t_D - t_j(\theta ) )\big) =V_j , \end{displaymath}$

(578)

and $\widetilde {\bf K}_{j}(\theta)$ = $({\bf K}_D^{-1}+{\bf K}_{j,DD}^{-1}(\theta))^{-1}$ being a $\tilde n\times\tilde n$ -matrix in data space. The equality of

and $\widetilde E_{0,j}$ can be seen using ${\bf K}_j-{\bf K}_j ({\bf K}_D + {\bf K}_j)^{-1}{\bf K}_j$ = ${\bf K}_D-{\bf K}_D ({\bf K}_D + {\bf K}_{j,DD})^{-1}{\bf K}_D$ = ${\bf K}_{j,DD}-{\bf K}_{j,DD }({\bf K}_D + {\bf K}_{j,DD)}^{-1}{\bf K}_{j,DD}$ = $\widetilde {\bf K}$ . For the predictive mean, being the optimal solution under squared-error loss and log-loss (restricted to Gaussian densities with fixed variance) we find therefore

$\begin{displaymath} \bar y (x) = \int\!dy\, y\, p(y\vert x,D,D_0) = \sum_j \int d\theta \; b_j(\theta )\, \bar t_j(\theta) , \end{displaymath}$

(579)

with, according to Eq. (327),

$\begin{displaymath} \bar t_j(\theta) = t_j + {\bf K}_j^{-1} \widetilde {\bf K}_j(t_D-t_j) , \end{displaymath}$

(580)

and mixture coefficients

$\displaystyle b_j(\theta)$	$\textstyle =$	$\displaystyle p(\theta,j\vert D) = \frac{p(\theta,j) p(y_D\vert x_D,D_0,\theta,j)} {\sum_j\int d\theta p(\theta,j) p(y_D\vert x_D,D_0,\theta,j)}$
	$\textstyle \propto$	$\displaystyle e^{-\beta \widetilde E_j (\theta) -E_{\theta,j} +\frac{1}{2} \ln \det (\widetilde K_j (\theta))} ,$	(581)

which defines $\widetilde E_j$ = $\beta \widetilde E_{0,j}$ + $E_{\theta,j}$ . For solvable $\theta$ -integral the coefficients can therefore be obtained exactly.

If is calculated in saddle point approximation at $\theta$ = $\theta^*$ it has the structure of in (552) with $E_{0,j}$ replaced by $\widetilde E_j$ and ${\bf K}_j$ by $\widetilde {\bf K}_j$ . (The inverse temperature $\beta$ could be treated analogously to $\theta$ . In that case $E_{\theta,j}$ would have to be replaced by $E_{\theta,\beta,j}$ .)

Calculating also the likelihood for , $\theta$ in Eq. (581) in saddle point approximation, i.e., $p(y_D\vert x_D,D_0,\theta^*,j) \approx p(y_D\vert x_D,h^*) p(h^*\vert D_0,\theta^*,j)$ , the terms $p(y_D\vert x_D,h^*)$ in numerator and denominator cancel, so that, skipping and $\beta$ ,

$\begin{displaymath} b_j(\theta^*) = \frac{p(h^*\vert j,\theta^*)p(j,\theta^*)}{p(h^*,\theta^*)} = a_j(h^*,\theta^*) , \end{displaymath}$

(582)

becomes equal to the $a_j(\theta^*)$ in Eq. (552) at

Eq. (581) yields as stationarity equation for $\theta$ , similarly to Eq. (494)

$\displaystyle 0$	$\textstyle =$	$\displaystyle \sum_j b_j \left( \frac{\partial \widetilde E_j}{\partial \theta}... ...{j}^{-1} \frac{\partial \widetilde {\bf K}_{j}}{\partial \theta}\right) \right)$	(583)
	$\textstyle =$	$\displaystyle \sum_j b_j \Bigg( \left( \frac{\partial t_j(\theta)}{\partial \theta},\; \widetilde {\bf K}_{j}(\theta)(t_j(\theta)-t_D)\right)\$
		$\displaystyle +\frac{1}{2} \left((t_D-t_j(\theta)),\, \frac{\partial \widetilde {\bf K}_{j}(\theta)} {\partial \theta}(t_D-t_j(\theta))\right)$
		$\displaystyle -{\rm Tr}\left(\widetilde {\bf K}_{j}^{-1}(\theta) \frac{\partial... ...t) -\frac{1}{p(\theta,j)} \frac{\partial p(\theta,j)}{\partial \theta} \Bigg) .$	(584)

For fixed $\theta$ and -independent covariances the high temperature solution is a mixture of component solutions weighted by their prior probability

$\begin{displaymath} \bar y \stackrel{\beta\rightarrow 0}{\longrightarrow} \sum_j p(j)\;\bar t_{j} = \sum_j a_j^0 \;\bar t_{j} = \bar t . \end{displaymath}$

(585)

The low temperature solution becomes the component solution $\bar t_j$ with minimal distance between data and prior template

$\begin{displaymath} \bar y \stackrel{\beta\rightarrow \infty}{\longrightarrow} \... ...big( t_D - t_j,\, \widetilde {\bf K}_{j} (t_D - t_j )\big) . \end{displaymath}$

(586)

Fig.11 compares the exact mixture coefficient

with the dominant solution of the maximum posterior coefficient

(see also [132]) which are related according to (569)

$\begin{displaymath} a_j= \frac{e^{-\frac{\beta}{2}a {B}_j a-\widetilde E_j}} {\... ...2}a {B}_j a}} {\sum_k b_k\, e^{-\frac{\beta}{2} a {B}_k a}} . \end{displaymath}$

(587)

**Figure 11:** Exact and (dashed) vs. $\beta$ for two mixture components with equal covariances and = = 2, $\widetilde E_1$ = 0.405, $\widetilde E_2$ = 0.605.
$\begin{figure}\begin{center} \epsfig{file=ps/cmp.eps, width=80mm}\end{center}\vspace{-0.7cm} \end{figure}$

Next: Local mixtures Up: Prior mixtures for regression Previous: Equal covariances Contents

Joerg_Lemm 2001-01-21