next up previous
Next: Quadratic concepts Up: Quadratic Concepts Previous: Quadratic Concepts

Introduction

In the setting of empirical learning available training data are used to obtain information about new test situations. Clearly, this generalization from training to test data requires knowledge about their dependencies. In this paper, such knowledge concerning dependencies between training and test data, in some contexts also known as rules or axioms, will be called prior knowledge.

To be specific, we consider a typical function approximation problem. Assume a given a set of training data $D_T$ = $\{(x_i,y_i\vert 1\le i \le n\}$ sampled i.i.d. from an unknown but fixed ``true state of Nature''. The aim is to obtain an approximation function $h(x)$ to predict unknown outcomes $y$ for test situations $x\in {\cal X}$ by $y=h(x)$.

Relying on the fact that the generalization ability of any learning system is crucially based on the dependencies it implements, our goal has to be a strict empirical measurement and control of the prior or ``dependency'' data $D_P$ which represent our prior knowledge. It is interesting to note that for the common situation with an infinite number of potential test situations $x\in {\cal X}$ also the number of dependencies to be controlled empirically becomes infinite. Empirical measurement of an infinite number of data, however, seems at first glance impossible. On the other hand, for an infinite set ${\cal X}$ of test situations any learning system has to use an infinite number of data, either explicitly or implicitly. To discuss this empirical measurement problem let us have a closer look at two examples:

1.
A simple bound on $h(x)$ for every $x$ like $h(x)\le a$, $\forall x\in {\cal X}$ corresponds for infinite ${\cal X}$ to an infinite number of data.
2.
Similarly, deviations from exact symmetries may be bounded. Let $\sigma$ denote a one-to-one transformation on ${\cal X}$. Then, $\vert\vert h(x)-h(\sigma (x))\vert\vert<a$ describes a bound on the deviation from an exact symmetry under $\sigma$, also corresponding to an infinite number of data. The prototypical example is smoothness i.e., approximate symmetry under infinitesimal translations.
Like the number of training data can only be finite for practical reasons also the number of actually appearing test situations in the future can only be finite. The key point is now, that from all possible dependencies only those related to actual test situations have to be controlled empirically. Measurement devices, for example, only have to be active at the, always finite, number of actually appearing test situations. Thus, there is an easy way to enforce bounds without actually measuring an infinite number of times. To be specific, bounds are often the consequence of using realistic, non-ideal measurement devices:
1.
A simple bound $h(x)<a$ is implemented by using a measurement device with cut-off at $a$.
2.
Assume that, in addition to an upper and lower bound on $h$, input noise or input averaging with respect to $\sigma$ is present in the measurement device we use. That means that we do not have perfect control over the value of $x$. In that situation fixing $x$ at the measurement device still allows that $\sigma(x)$ produces the observable result $y$. The resulting effective function is necessarily smooth with respect to the transformations $\sigma$ which generate the input noise. (Compare [1,4,5].)

Thus, one can say: Infinite a-priori information can be empirically measured by a-posteriori control at the time of testing. From this point of view, related to that of constructivism, (also infinite) a-priori information can (and should) be explicitly related to empirical control of the application situation.


next up previous
Next: Quadratic concepts Up: Quadratic Concepts Previous: Quadratic Concepts
Joerg_Lemm 2000-09-22