Solving learning problems numerically
by discretizing the and
variables
allows in principle to deal with
arbitrary non-Gaussian priors.
Compared to Gaussian priors, however,
the resulting stationarity equations are intrinsically nonlinear.
As a typical example let us formulate a prior
in terms of nonlinear and non-quadratic
``potential'' functions
acting on ``filtered differences''
=
,
defined with respect to some
positive (semi-)definite inverse covariance
=
.
In particular, consider a prior factor
of the following form
For differentiable function
the functional derivative with respect to
becomes
![]() |
(596) |
The potential functions
may be fixed in advance for a given problem.
Typical choices to allow discontinuities
are symmetric ``cup'' functions
with minimum at zero and flat tails
for which one large step is cheaper than many small ones
[238]).
Examples are shown in Fig. 12 (a,b).
The cusp in (b), where the derivative does not exist,
requires special treatment [246].
Such functions can also be interpreted in the sense of robust statistics
as flat tails reduce
the sensitivity with respect to outliers
[100,101,67,26].
Inverted ``cup'' functions, like those
shown in Fig. 12 (c),
have been obtained by optimizing a set of
with respect to a sample of natural images
[246].
(For statistics of natural images
their relation to wavelet-like filters and sparse coding
see also
[175,176].)
![]() |
While, for which are differential operators,
cup functions promote smoothness,
inverse cup functions can be used to implement
structure.
For such
the gradient algorithm for minimizing
,
![]() |
(600) |
![]() |
(601) |
Alternatively to fixing in advance
or, which is sometimes possible for low-dimensional
discrete function spaces like images,
to approximate
by sampling from the prior distribution,
one may also introduce hyperparameters
and adapt potentials
to the data.
For example, attempting to adapt a unrestricted function
with hyperprior
by
Maximum A Posteriori Approximation
one has to solve the stationarity condition
![]() |
(602) |
![]() |
(603) |
![]() |
(605) |
![]() |
(606) |
Introducing hyperparameters one has to keep in mind that the resulting additional flexibility must be balanced by the number of training data and the hyperprior to be useful in practice.