I. Introduction
The PROBLEM of regression analysis is one of the fundamental problems within the field of supervised machine learning. It can be stated as estimating a real valued function, given a sample of noisy observations. In the usual setting of supervised learning, the data is obtained as independently identically distributed (i.i.d.) pairs of feature vectors and corresponding targets . For regression problems in particular, the real-valued targets are considered corrupted versions of a set of unobserved values under an additive noise model: , where the noise random variables are distributed according to some density function . Prior knowledge about this noise distribution is crucial for quantifying the approximation accuracy, i.e., for choosing an adequate loss function.