next up previous
Next: Maximum A Posteriori Approximation Up: Bayesian Field Theory Nonparametric Previous: Bayesian Field Theory Nonparametric

Basic definitions

A Bayesian approach to empirical learning problems requires two main inputs [1,10]: 1. a (conditional) likelihood model $p(y\vert x,\phi)$ representing the probability of measuring $y$ given (visible) condition $x$ and (hidden) state of Nature $\phi $. 2. a prior model $p_0(\phi)$ encoding the a priori information available on $\phi $. We assume that the joint probability factorizes according to

\begin{displaymath}
p(x,y,\phi) = p(y\vert x,\phi) \, p(x) \, p(\phi)
.
\end{displaymath} (1)

Furthermore, we will consider $n$ independent (training) data $D$ = $\{(x_i,y_i)\vert 1\le i\le n\}$. Denoting by $y_D$ ($x_D$) the vector of the $y_i$ ($x_i$), we assume $p(y_D\vert x_D,\phi)$ = $\prod_{i=1}^n p(y_i\vert x_i,\phi)$ and $p(x)$ = $\prod_{i=1}^n p(x_i)$. Different classes of likelihood functions define different problem types. In the following we will be interested in general density estimation problems [9], the other problems listed in Tab. 1 being special cases thereof.

Within nonparametric approaches, the function values $P(x,y)$ = $p(y\vert x,\phi)$ itself, specifying a state of Nature $\phi $, are considered as the primary degrees of freedom, their only constraint being the positivity and normalization conditions for $p(y\vert x,\phi)$. In particular, we will study in this paper likelihoods $P$ being functionals of fields $\phi(x,y)$, i.e., $P$ = $P(\phi)$. Examples of such functionals $P(\phi)$ are shown in Tab. 2. While the first and third row show the `local' functionals $P(\phi)$ corresponding to $\phi(x,y)$ = $P(x,y)$ and $\phi(x,y)$ = $\ln P(x,y)$, the other functionals are nonlocal in $y$.

Depending on the choice of the functional $P(\phi)$ the normalization and positivity constraint for $P$ take different forms for $\phi $. For example, the normalization condition is automatically fulfilled in rows two and four of Tab. 2. In the last row, where $\phi(x,y)$ = $\int^y_{-\infty} dy^\prime \,P(x,y^\prime)$, the normalization constraint for $P$ becomes the boundary condition $\phi(x,1)$ = 1, and positivity of $P$ means monotony of $\phi(x,\cdot)$.


Table 1: Special cases of density estimation. For classification and regression see [12,11,3,4] and references therein, for clustering see [8], and for Bayesian approaches to inverse quantum mechanics, aiming for example in the reconstruction of potentials from observational data, see [5]. ($\rho $ denotes the density operator of a quantum system, $\pi _{(y,x)}$ the projector to the eigenstate of observable $x$, represented by an hermitian operator, with eigenvalue $y$.)
(conditional) likelihood $p(y\vert x,\phi)$ problem type
of general form density estimation
discrete $y$ classification
$\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{[y-\phi(x)]^2}{2\sigma^2}}$ (Gaussian, $\sigma$ fixed) regression
$\sum_k\frac{1}{\sigma_k\sqrt{2\pi}}e^{-\frac{[y-\phi_k(x)]^2}{2\sigma_k^2}}$ (mixture of Gaussians) clustering
Trace( $\rho(\phi) \,\pi_{(y,x)}$) inverse quantum mechanics



Table 2: Constraints for some specific choices of $\phi $
$P(\phi)$ constraints
$P(x,y)=\phi(x,y)$ norm positivity
$P(x,y)=\frac{\phi(x,y)}{\int\!\phi(x,y^\prime)\,dy^\prime}$ -- positivity
$P(x,y)=e^{\phi(x,y)}$ norm --
$P(x,y)=\frac{e^{\phi(x,y)}}{\int\!e^{\phi(x,y^\prime)}\,dy^\prime}$ -- --
$P(x,y)=\frac{d\phi(x,y)}{dy}$ boundary monotony



next up previous
Next: Maximum A Posteriori Approximation Up: Bayesian Field Theory Nonparametric Previous: Bayesian Field Theory Nonparametric
Joerg_Lemm 2000-09-12