Next: Gaussian mixture regression (cluster Up: Regression Previous: Gaussian regression   Contents

### Exact predictive density

For Gaussian regression the predictive density under training data and prior can be found analytically without resorting to a saddle point approximation. The predictive density is defined as the -integral

 (307)

Denoting training data values by sampled with inverse covariance concentrated on and analogously test data values = by sampled with inverse (co-)variance , we have for
 (308)

and
 (309)

hence
 (310)

Here we have this time written explicitly for a determinant calculated in that space where is invertible. This is useful because for example in general . Using the generalized `bias-variance'-decomposition (230) yields
 (311)

with
 (312) (313) (314) (315)

Now the -integration can be performed
 (316)

Canceling common factors, writing again for , for , for , and using = , this becomes
 (317)

Here we introduced = = and used that
 (318)

can be calculated in the space of test data . This follows from = and the equality
 (319)

with = , = , and denoting the projector into the space of test data . Finally
 (320)

yields the correct normalization of the predictive density
 (321)

with mean and covariance
 (322) (323)

It is useful to express the posterior covariance by the prior covariance . According to
 (324)

with = , = , and = , = , = we find
 (325)

Notice that while = in general = . This means for example that has to be known to find and it is not enough to invert = . In data space = , so Eq. (325) can be manipulated to give
 (326)

This allows now to express the predictive mean (322) and covariance (323) by the prior covariance
 (327) (328)

Thus, for given prior covariance both, and , can be calculated by inverting the matrix = .

Comparison of Eqs.(327,328) with the maximum posterior solution of Eq. (277) now shows that for Gaussian regression the exact predictive density and its maximum posterior approximation have the same mean

 (329)

The variances, however, differ by the term .

According to the results of Section 2.2.2 the mean of the predictive density is the optimal choice under squared-error loss (51). For Gaussian regression, therefore the optimal regression function is the same for squared-error loss in exact and in maximum posterior treatment and thus also for log-loss (for Gaussian with fixed variance)

 (330)

In case the space of possible is not restricted to Gaussian densities with fixed variance, the variance of the optimal density under log-loss = differs by from its maximum posterior approximation = .

Next: Gaussian mixture regression (cluster Up: Regression Previous: Gaussian regression   Contents
Joerg_Lemm 2001-01-21