Choosing actions in specific situations
often requires the use of specific loss functions.
Such loss functions may
for example contain additional terms
measuring costs of choosing action
not related to approximation of the predictive density.
Such costs can quantify aspects like
the simplicity, implementability,
production costs, sparsity,
or understandability of action
.
Furthermore, instead of approximating a whole density
it often suffices to extract some of its features,
like identifying clusters of similar -values,
finding independent components for multidimensional
,
or mapping to an approximating density with lower dimensional
.
This kind of exploratory data analysis
is the Bayesian analogue to unsupervised learning methods.
Such methods are on one hand often utilized as a preprocessing step
but are, on the other hand,
also important to choose actions for situations
where specific loss functions can be defined.
From a Bayesian point of view general loss functions require in general an explicit two-step procedure [132]: 1. Calculate (an approximation of) the predictive density, and 2. Minimize the expectation of the loss function under that (approximated) predictive density. (Empirical risk minimization, on the other hand, minimizes the empirical average of the (possibly regularized) loss function, see Section 2.5.) (For a related example see for instance [139].)
For a Bayesian version of cluster analysis, for example, partitioning
a predictive density obtained from empirical data
into several clusters,
a possible loss function is
![]() |
(64) |
For multidimensional
a space of actions
can be chosen
depending only on a (possibly adaptable) lower dimensional
projection of
.
For multidimensional with components
it is often useful to identify independent components.
One may look, say, for a linear mapping
=
minimizing the
correlations between different components of
the `source' variables
by minimizing the loss function
![]() |
(65) |