Next: A numerical example
Up: Bayesian Field Theory Nonparametric
Previous: Basic definitions
In this paper, our aim will be to reconstruct
the field from observational data
in Maximum A Posteriori Approximation (MAP),
i.e,
by maximizing
|
(2) |
with respect to .
[Alternatively, one can use Monte-Carlo methods to approximate
the Bayesian predictive density .]
Implementing the normalization conditions on (for all )
by a Lagrange term with multiplier function ,
introducing an additional Gaussian process prior term for
[2]
(with zero mean and inverse covariance ,
real symmetric, positive definite),
and considering the usual case where the
positivity constraint is not active at the stationary point
(or automatically satisfied by the choice of )
so a corresponding Lagrange multiplier can be skipped,
we have therefore to minimize the error functional,
|
(3) |
Here
denotes the empirical density, i.e.,
=
,
and the bra-ket notation
,
denoting scalar products,
stands for integration over and .
(Thus, the error functional (3)
is, up to a -independent constant
and the constraint implementation by Lagrange multiplier terms,
the negative logarithm of the posterior (2).
Prior terms can be made more flexible by including hyperparameters
[1,6].
See [4] for a more detailed discussion.)
Notice that a Gaussian prior term in the field
is in general non-Gaussian in terms of the
conditional likelihood .
The MAP equation to be solved
is easily found by setting the functional derivative
of (3) with respect
to to zero, yielding,
|
(4) |
where
=
and
=
.
For existing
=
this can be written
|
(5) |
From the normalization condition
=
[where
=
and
= ],
it follows for
=
that
= ,
and thus, multiplying Eq. (5) by ,
|
(6) |
where
=
=
.
can now be eliminated by
inserting Eq. (6) into Eq. (5)
|
(7) |
In contrast to regression
with Gaussian process priors [12]
(where MAP is exact),
and similarly, classification for a specific choice of
[11],
the solution of Eq. (7) cannot be found
in a finite dimensional space defined by the training data .
Thus,
Eq. (7) has to be solved by discretization,
analogously, for example, to solving
field theories on a lattice [7].
Starting with an initial guess ,
a possible iteration scheme is given by
|
(8) |
with some positive definite matrix ,
and a number , both possibly changing during the iteration.
A convenient -independent choice for
is often the inverse covariance .
(Quasi-Newton methods, for example, use
an -dependent , the standard gradient algorithm
corresponds to choosing for the identity matrix.)
For each calculated gradient,
the factor can be determined
by a one-dimensional line search algorithm.
An iteration scheme similar to Eq. (8),
can be obtained using Eq. (4),
which then requires to adapt the Lagrange multiplier function
during iteration.
Clearly, a direct discretization can only be useful
for low dimensional and variables
(say one- or two-dimensional, like time series or images).
Due to increasing computing capabilities, however,
many low-dimensional problems
are now directly solvable by discretization [3].
For variables or
living in higher dimensional spaces
additional approximations are necessary [4].
Next: A numerical example
Up: Bayesian Field Theory Nonparametric
Previous: Basic definitions
Joerg_Lemm
2000-09-12