As a first step empirical dependency control requires an
explicit and readable formulation of prior knowledge.
By explicit formulation we mean in the following
the expression of prior knowledge
directly in terms of the functions values ,
like it is done in regularization theory or for stochastic processes.
In an implicit implementation of dependencies, on the other hand,
single function values
are not parameterized independently.
Examples include neural networks, linear, additive or tensor models.
Also the realization of learning algorithms can induce dependencies,
e.g., due to restricted initial conditions and stopping rules.
In the regularization framework an approximation
is chosen to minimize a regularization or error
functional
. Prior knowledge can be represented
by a regularization term added to the training error.
A typical example of a smoothness related
regularization functional for
-dimensional
=
is
Definition. (Quadratic concept.)
A quadratic concept is a pair
consisting of a template function
and a real symmetric, positive semi-definite operator on
,
the concept operator
.
The operator
defines a concept distance
on subspaces where it is positive definite.
The maximal subspace in which the positive semi-definite
is positive definite
is the concept space
of
.
The corresponding hermitian projector
in this subspace
is the concept projector.
Remark 1 (Approximate symmetries):
Typical concept operators are related to symmetries.
Let for example
with
a permutation within
, i.e., one-to-one.
Then
=
, with identity
and
denoting the transpose,
defines a symmetry concept
.
Similarly, assume
to be a continuous symmetry (Lie) group, parameterized
by
and with infinitesimal generators
.
Then
an infinitesimal symmetry concept for
is defined by
.
For infinitesimal translations (smoothness)
.
Remark 2 (Gaussian processes):
A quadratic concept defines a Gaussian process
according to
with covariance operator
.
We remark however,
that while Gaussian processes can also be defined for continuous
[2,7]
we do not discuss continuum limits for the non-Gaussian extensions below.
In these cases we refer to a lattice approximation.
Remark 3 (Support vector machine):
Expanding
in a basis of eigenfunctions of
one obtains
=
.
Replacing the mean-square training error
by Vapnik's
-insensitive error
yields a support vector machine with kernel
[6].
In general, flat regions of
as they appear in
the
-insensitive error and other robust error
functions have the technical advantage
that they do not contribute to the gradient and can be ignored
within a saddle point approximation.
Remark 4 (Templates):
Templates can be constructed directly
by experts. They also can represent a structural hypothesis
realized by a parameterized
learning system like a neural network.
Templates can be used for transfer
by choosing them as the output of a learning system trained
for a similar situation.
For finite spaces templates can in principle be estimated by sampling.
Remark 5 (Covariances):
Covariances can be given directly by experts.
They do not necessarily have to be local
but can include non-local correlations.
As already remarked they are often constructed from
symmetry considerations.
For finite spaces covariances can in principle also be
estimated by sampling.