next up previous contents
Next: Prior mixtures for density Up: Non-Gaussian prior factors Previous: Non-Gaussian prior factors   Contents


Mixtures of Gaussian prior factors

Complex, non-Gaussian prior factors, for example being multimodal, may be constructed or approximated by using mixtures of simpler prior components. In particular, it is convenient to use as components or ``building blocks'' Gaussian densities, as then many useful results obtained for Gaussian processes survive the generalization to mixture models [132,133,134,135,136]. We will therefore in the following discuss applications of mixtures of Gaussian priors. Other implementations of non-Gaussian priors will be discussed in Section 6.5.

In Section 5.1 we have seen that hyperparameters label components of mixture densities. Thus, if $j$ labels the components of a mixture model, then $j$ can be seen as hyperparameter. In Section 5 we have treated the corresponding hyperparameter integration completely in saddle point approximation. In this section we will assume the hyperparameters $j$ to be discrete and try to calculate the corresponding summation exactly.

Hence, consider a discrete hyperparameter $j$, possibly in addition to continuous hyperparameters $\theta$. In contrast to the $\theta$-integral we aim now in treating the analogous sum over $j$ exactly, i.e., we want to study mixture models

\begin{displaymath}
p(\phi,\theta \vert\tilde D_0)
= \sum_j^m p(\phi,\theta,j \v...
... D_0)
= \sum_j^m p(\phi\vert\tilde D_0,\theta,j) p(\theta,j)
.
\end{displaymath} (535)

In the following we concentrate on mixtures of Gaussian specific priors. Notice that such models do not correspond to Gaussian mixture models for $\phi $ as they are often used in density estimation. Indeed, the form of $\phi $ may be completely unrestricted, it is only its prior or posterior density which is modeled by a mixture. We also remark that a strict asymptotical justification of a saddle point approximation would require the introduction of a parameter $\tilde \beta$ so that $p(\phi,\theta \vert\tilde D_0)\propto e^{\tilde\beta \ln \sum_j p_j}$. If the sum is reduced to a single term, then $\tilde \beta$ corresponds to $\beta $.

We already discussed shortly in Section 5.2 that, in contrast to a product of probabilities or a sum of error terms implementing a probabilistic AND of approximation conditions, a sum over $j$ implements a probabilistic OR. Those alternative approximation conditions will in the sequel be represented by alternative templates $t_j$ and inverse covariances ${{\bf K}}_j$. A prior (or posterior) density in form of a probabilistic OR means that the optimal solution does not necessarily have to approximate all but possibly only one of the $t_j$ (in a metric defined by ${{\bf K}}_j$). For example, we may expect in an image reconstruction task blue or brown eyes whereas a mixture between blue and brown might not be as likely. Prior mixture models are potentially useful for

1.
Ambiguous (prior) data. Alternative templates can for example represent different expected trends for a time series.
2.
Model selection. Here templates represent alternative reference models (e.g., different neural network architectures, decision trees) and determining the optimal $\theta$ corresponds to training of such models.
3.
Expert knowledge. Assume a priori knowledge to be formulated in terms of conjunctions and disjunctions of simple components or building blocks (for example verbally). E.g., an image of a face is expected to contain certain constituents (eyes, mouth, nose; AND) appearing in various possible variants (OR). Representing the simple components/building blocks by Gaussian priors centered around a typical example (e.g.,of an eye) results in Gaussian mixture models. This constitutes a possible interface between symbolic and statistical methods. Such an application of prior mixture models has some similarities with the quantification of ``linguistic variables'' by fuzzy methods [118,119].

For a discussion of possible applications of prior mixture models see also [132,133,134,135,136]. An application of prior mixture models to image completion can be found in [137].


next up previous contents
Next: Prior mixtures for density Up: Non-Gaussian prior factors Previous: Non-Gaussian prior factors   Contents
Joerg_Lemm 2001-01-21