next up previous contents
Next: Support vector machines and Up: Regression Previous: Exact predictive density   Contents

Gaussian mixture regression (cluster regression)

Generalizing Gaussian regression the likelihoods may be modeled by a mixture of $m$ Gaussians

\begin{displaymath}
p(y\vert x,{h})
=
\frac{\sum_k^m p(k)\, e^{-\frac{\beta}{2}...
...int \!dy\,\sum_k^m p(k)\, e^{-\frac{\beta}{2} (y-h_k(x))^2}}
,
\end{displaymath} (331)

where the normalization factor is found as $\sum_k p(k) \left(\frac{\beta}{2\pi}\right)^{\frac{m}{2}}$. Hence, $h$ is here specified by mixing coefficients $p(k)$ and a vector of regression functions $h_k(x)$ specifying the $x$-dependent location of the $k$th cluster centroid of the mixture model. A simple prior for $h_k(x)$ is a smoothness prior diagonal in the cluster components. As any density $p(y\vert x,h)$ can be approximated arbitrarily well by a mixture with large enough $m$ such cluster regression models allows to interpolate between Gaussian regression and more flexible density estimation.

The posterior density becomes for independent data

\begin{displaymath}
p(h\vert D,D_0)
=
\frac{p(h\vert D_0)}{p(y_D\vert x_D,D_0)}
...
...um_k^m p(k)\, \left(\frac{\beta}{2\pi}\right)^{\frac{m}{2}}}
.
\end{displaymath} (332)

Maximizing that posterior is -- for fixed $x$, uniform $p(k)$ and $p(h\vert D_0)$ -- equivalent to the clustering approach of Rose, Gurewitz, and Fox for squared distance costs [203].


next up previous contents
Next: Support vector machines and Up: Regression Previous: Exact predictive density   Contents
Joerg_Lemm 2001-01-21