next up previous contents
Next: Analytical solution of mixture Up: Prior mixtures for regression Previous: High and low temperature   Contents

Equal covariances

Especially interesting is the case of $j$-independent ${{\bf K}}_j(\theta)$ = ${{\bf K}}_0 (\theta )$ and $\theta$-independent $\det {{\bf K}}_0 (\theta)$. In that case the often difficult to obtain determinants of ${{\bf K}}_j$ do not have to be calculated.

For $j$-independent inverse covariances the high temperature solution is according to Eqs.(555,561) a linear combination of the (potential) low temperature solutions

\begin{displaymath}
\bar t = \sum_j^m a^0_j \bar t_j
.
\end{displaymath} (566)

It is worth to emphasize that, as the solution $\bar t$ is not a mixture of the component templates $t_j$ but of component solutions $\bar t_j$, even poor choices for the template functions $t_j$ can lead to good solutions, if enough data are available. That is indeed the reason why the most common choice $t_0\equiv 0$ for a Gaussian prior can be successful.

Eqs.(565) simplifies to

\begin{displaymath}
{h}
= \frac{\sum_j^m \bar t_j e^{-\beta E_{{h},j} ({h})-E_{...
..._j^m a_j \bar t_j
=\bar t + \sum_j^m (a_j-a_j^0) \, \bar t_j
,
\end{displaymath} (567)

where
\begin{displaymath}
\bar t_j
= \left( {{\bf K}}_D + {{\bf K}}_0 \right)^{-1}
\left(
{{\bf K}}_D t_D + {{\bf K}}_0 t_j
\right)
,
\end{displaymath} (568)

and (for $j$-independent $d$)
\begin{displaymath}
a_j
= \frac{e^{-E_j}}
{\sum_k e^{-E_k}}
= \frac{e^{-\beta ...
...a {B}_j a+d_j}}
{\sum_k e^{-\frac{\beta}{2} a {B}_k a+d_k}}
,
\end{displaymath} (569)

introducing vector $a$ with components $a_j$, $m\times m$ matrices
\begin{displaymath}
{B}_j (k,l) =
\Big(\bar t_k-\bar t_j,\,\left( {{\bf K}}_D + {{\bf K}}_0\right)
\,(\bar t_l-\bar t_j)\Big)
\end{displaymath} (570)

and constants
\begin{displaymath}
d_j=
-\beta V_j-E_{\theta,\beta,j}
,
\end{displaymath} (571)

with $V_j$ given in (564). Eq. (567) is still a nonlinear equation for ${h}$, it shows however that the solutions must be convex combinations of the ${h}$-independent $\bar t_j$. Thus, it is sufficient to solve Eq. (569) for $m$ mixture coefficients $a_j$ instead of Eq. (548) for the function ${h}$.

The high temperature relation Eq. (553) becomes

\begin{displaymath}
a_j
\stackrel{\beta\rightarrow 0}{\longrightarrow}
a^0_j =
...
...{e^{-E_{\theta,\beta,j}}}
{\sum_k^me^{-E_{\theta,\beta,k}}}
,
\end{displaymath} (572)

or $a^0_j = 1/m$ for a hyperprior $p(\theta,\beta,j)$ uniform with respect to $j$. The low temperature relation Eq. (559) remains unchanged.

For $m=2$ Eq. (567) becomes

\begin{displaymath}
{h}
=\sum_j^2 a_j \bar t_j
=\frac{\bar t_1 + \bar t_2}{2}
...
...
+ \left(\tanh \Delta\right) \frac{\bar t_1 - \bar t_2}{2}
,
\end{displaymath} (573)

with $(\bar t_1+\bar t_2)/2$ = $\bar t$ in case $E_{\theta,\beta,j}$ is uniform in $j$ so that $a_j^0$ = $0.5$, and
$\displaystyle \Delta$ $\textstyle =$ $\displaystyle \frac{E_2-E_1}{2}
\; = \; \beta \frac{E_{{h},2}-E_{{h},1}}{2}
+\frac{E_{\theta,\beta,2}-E_{\theta,\beta,1}}{2}$  
  $\textstyle =$ $\displaystyle -\frac{\beta}{4} a(B_1-B_2)a +\frac{d_1-d_2}{2}
\; =\; \frac{\beta}{4} \,b(2 a_1-1) + \frac{d_1-d_2}{2}
,$ (574)

because the matrices $B_j$ are in this case zero except $B_1(2,2) = B_2(1,1) = b$. The stationarity Eq. (569) can be solved graphically (see Figs.7, 8), the solution being given by the point where $
a_1 e^{-\frac{\beta}{2} b a_1^2 + d_2}
=
(1-a_1) e^{-\frac{\beta}{2} b (1-a_1)^2+d_1}
$, or, alternatively,
\begin{displaymath}
a_1 = \frac{1}{2} \left(\tanh \Delta + 1\right)
.
\end{displaymath} (575)

That equation is analogous to the celebrated mean field equation of the ferromagnet.

We conclude that in the case of equal component covariances, in addition to the linear low-temperature equations, only a $m-1$-dimensional nonlinear equation has to be solved to determine the `mixing coefficients' $a_1,\cdots , a_{m-1}$.

Figure: The solution of stationary equation Eq. (569) is given by the point where $a_1 e^{-\frac{\beta}{2} b a_1^2 + d_2}$ = $(1-a_1) e^{-\frac{\beta}{2} b (1-a_1)^2+d_1}$ (upper row) or, equivalently, $a_1$ = $\frac{1}{2} \left(\tanh \Delta + 1\right)$ (lower row). Shown are, from left to right, a situation at high temperature and one stable solution ($\beta $ = $2$), at a temperature ($\beta $ = $2.75$) near the bifurcation, and at low temperature with two stable and one unstable solutions $\beta $ = $4$. The values of $b$ = $2$, $d_1 = -0.2025 \beta $ and $d_2 = -0.3025 \beta $ used for the plots correspond for example to the one-dimensional model of Fig.9 with $t_1=1$, $t_2=-1$, $t_D = 0.1$. Notice, however, that the shown relation is valid for $m=2$ at arbitrary dimension.
\begin{figure}\vspace{-5cm}
\begin{center}
\epsfig{file=ps/b2pic.ps, width=110mm}\end{center}\vspace{-6.0cm}
\end{figure}

Figure 8: As in Fig.7 the plots of $f_1(a_1)=a_1$ and $f_2(a_1)=\frac{1}{2} \left(\tanh \Delta + 1\right)$ are shown within the inverse temperature range $0\le \beta \le 4$.
\begin{figure}\vspace{-5cm}
\begin{center}
$\!\!\!\!\!\!\!\!\!\!$\epsfig{file=ps/tpic.ps, width=95mm}\end{center}\vspace{-3.5cm}
\end{figure}

Figure 9: Shown is the joint posterior density of $h$ and $\beta $, i.e., $p({h},\beta \vert D,D_0)$ $\propto p(y_D\vert{h},\beta )p({h}\vert\beta ,D_0)p(\beta )$ for a zero-dimensional example of a Gaussian prior mixture model with training data $y_D=0.1$ and prior data $y_{D_0}=\pm 1$ and inverse temperature $\beta $. L.h.s.: For uniform prior (middle) $p(\beta ) \propto 1$ with joint posterior $p \propto $ $e^{-\frac{\beta}{2} {h}^2 + \ln \beta}$ $\left(e^{-\frac{\beta}{2} ({h}-1)^2}
+ e^{-\frac{\beta}{2} ({h}+1)^2}\right)$ the maximum appears at finite $\beta $. (Here no factor $1/2$ appears in front of $\ln\beta$ because normalization constants for prior and likelihood term have to be included.) R.h.s.: For compensating hyperprior $p(\beta ) \propto 1/\sqrt {\beta }$ with $p \propto $ $e^{-\frac{\beta}{2} {h}^2 -\frac{\beta}{2} ({h}-1)^2}$ $+$ $e^{-\frac{\beta}{2} {h}^2 -\frac{\beta}{2} ({h}+1)^2}$ the maximum is at $\beta $ = $0$.
\begin{figure}\begin{center}
\epsfig{file=ps/betaMU.eps, width= 65mm}\epsfig{file=ps/betaMC.eps, width= 65mm}\end{center}\end{figure}

Figure 10: Same zero-dimensional prior mixture model for uniform hyperprior on $\beta $ as in Fig.9, but for varying data $x_d=0.3$ (left), $x_d=0.5$ (right).
\begin{figure}\begin{center}
\epsfig{file=ps/betaMUa.eps, width= 65mm}\epsfig{file=ps/betaMUb.eps, width= 65mm}\end{center}\end{figure}


next up previous contents
Next: Analytical solution of mixture Up: Prior mixtures for regression Previous: High and low temperature   Contents
Joerg_Lemm 2001-01-21