The hyperparameters considered up to now
have been real numbers, or vector of real numbers.
Such hyperparameters can describe continuous transformations,
like the translation, rotation or scaling of template functions
and the scaling of inverse covariance operators.
For real
and differentiable posterior,
stationarity conditions can be found by differentiating
the posterior with respect to
.
Instead of a class of continuous transformations
a finite number of alternative template functions or inverse covariances
may be given.
For example, an image to be reconstructed
might be expected to show a digit between zero and nine,
a letter from some alphabet,
or the face of someone
who is a member of known group of people.
Similarly,
a particular times series may
be expected to be either in a high or in a low variance regime.
In all these cases,
there exist a finite number of classes
which could be represented by specific templates
or inverse covariances
.
Such ``class'' variables
are
nothing else than hyperparameters
with integer values.
Binary parameters, for example,
allow to select from two reference functions or two inverse covariances
that one which fits the data best.
E.g., for =
one can write
For integer the integral
becomes a sum
(we will also use the letter
and write
for integer hyperparameters),
so that prior, posterior, and predictive density
have the form of a finite mixture
with components
.
For a moderate number of components one may be able to include all of the mixture components. Such prior mixture models will be studied in Section 6.
If the number of mixture components is too large to
include them all explicitly,
one again must restrict to some of them.
One possibility is to select a random sample
using Monte-Carlo methods.
Alternatively, one may search for the
with maximal posterior.
In contrast to typical optimization problems for real variables,
the corresponding integer optimization problems
are usually not very smooth with respect to
(with smoothness defined in terms of differences instead of derivatives),
and are therefore often much harder to solve.
There exists, however, a variety of deterministic and stochastic integer optimization algorithms, which may be combined with ensemble methods like genetic algorithms [98,79,44,157,121,209,160], and with homotopy methods, like simulated annealing [114,156,199,43,1,203,243,68,244,245]. Annealing methods are similar to (Markov chain) Monte-Carlo methods, which aim in sampling many points from a specific distribution (i.e., for example at fixed temperature). For them it is important to have (nearly) independent samples and the correct limiting distribution of the Markov chain. For annealing methods the aim is to find the correct minimum (i.e., the ground state having zero temperature) by smoothly changing the temperature from a finite value to zero. For them it is less important to model the distribution for nonzero temperatures exactly, but it is important to use an adequate cooling scheme for lowering the temperature.
Instead of an integer optimization problem
one may also try to solve a similar problem
for real .
For example,
the binary
in Eqs. (496) and (497)
may be extended to real
.
By smoothly increasing an appropriate additional hyperprior
one can finally enforce again binary hyperparameters
.