In the previous sections the error functionals we will try to minimize in the following have been given a Bayesian interpretation in terms of the log-posterior density. There is, however, an alternative justification of error functionals using the Frequentist approach of empirical risk minimization [224,225,226].
Common to both approaches is the aim to minimize
the expected risk for action
![]() |
(104) |
![]() |
(105) |
From that Frequentist point of view one is
not restricted to logarithmic data terms
as they arise from the posterior-related Bayesian interpretation.
However, like in the Bayesian approach,
training data terms are not enough to
make the minimization problem well defined.
Indeed this is a typical inverse problem
[224,115,226]
which can,
according to the classical regularization approach
[220,221,165],
be treated
by including additional regularization (stabilizer) terms
in the loss function .
Those regularization terms, which correspond
to the prior terms in a Bayesian approach,
are thus from the point of view of empirical risk minimization
a technical tool
to make the minimization problem well defined.
The empirical generalization error
for a test or validation data set independent
from the training data , on the other hand,
is measured using only the data terms of the error functional
without regularization terms.
In empirical risk minimization
this empirical generalization error
is used, for example,
to determine adaptive (hyper-)parameters
of regularization terms.
A typical example is a factor multiplying the regularization terms
controlling the trade-off between data and regularization terms.
Common techniques
using the empirical generalization error
to determine such parameters
are cross-validation or bootstrap like techniques
[166,6,230,216,217,81,39,228,54].
From a strict Bayesian point of view
those parameters would have to be integrated out
after defining an appropriate prior
[16,147,149,24].