Next: Gaussian mixture regression (cluster
Up: Regression
Previous: Gaussian regression
Contents
Exact predictive density
For Gaussian regression the predictive density
under training data and prior
can be found analytically
without resorting to a saddle point approximation.
The predictive density is defined as the integral
Denoting
training data values by
sampled with inverse covariance
concentrated on
and analogously
test data values = by
sampled with inverse (co)variance ,
we have for

(308) 
and

(309) 
hence

(310) 
Here we have this time written explicitly
for a determinant calculated
in that space where is invertible.
This is useful because for example in general
.
Using the generalized `biasvariance'decomposition (230)
yields

(311) 
with
Now the integration can be performed

(316) 
Canceling common factors,
writing again for ,
for ,
for ,
and
using
=
,
this becomes

(317) 
Here we introduced
= =
and used that

(318) 
can be calculated in the space of test data .
This follows from
=
and the equality

(319) 
with
=
,
=
,
and denoting the projector
into the space of test data .
Finally

(320) 
yields the correct normalization of the predictive density

(321) 
with mean and covariance
It is useful to express
the posterior covariance by
the prior covariance
.
According to

(324) 
with
=
,
=
,
and
=
,
=
,
=
we find
Notice that while
=
in general
=
.
This means for example that
has to be
known to find
and it is not enough to invert
=
.
In data space
=
,
so Eq. (325) can be manipulated to give

(326) 
This allows now to express
the predictive mean (322) and covariance (323)
by the prior covariance
Thus, for given prior covariance
both,
and
,
can be calculated
by inverting the
matrix
=
.
Comparison of Eqs.(327,328)
with the maximum posterior solution of
Eq. (277)
now shows that for Gaussian regression
the exact predictive density
and its
maximum posterior approximation
have the same
mean

(329) 
The variances, however, differ
by the term
.
According to the results of Section 2.2.2
the mean of the predictive density
is the optimal choice under squarederror loss (51).
For Gaussian regression, therefore
the optimal regression function
is the same for squarederror loss
in exact and in maximum posterior treatment
and thus also for logloss
(for Gaussian with fixed variance)

(330) 
In case the space of possible
is not restricted to Gaussian densities with fixed variance,
the variance of the optimal density under logloss
= differs by
from its maximum posterior approximation
= .
Next: Gaussian mixture regression (cluster
Up: Regression
Previous: Gaussian regression
Contents
Joerg_Lemm
20010121