I. Introduction
The field of image processing and computer vision, in layman's terms, is a science of making the computers to perceive and understand the real world scenes. Commonly, one of the first and foremost steps involved in any computer vision task is the preprocessing of the digital images that are created by a sensor Typical computer vision system. [for example, a camera, a medical ultrasound (US) scanning system, and a magnetic resonance imaging (MRI) system]. This involves noise cleaning and enhancement. Images, so enhanced, are passed through the subsequent steps that are to be carried out to realize a computer vision task. A typical computer vision system is illustrated in Fig. 1. An ellipse represents an algorithm, procedure or method acting upon a data object represented by a rectangle. A digital image formed by sensing the real world is preprocessed to clean up the noise and enhance the image content for subsequent processing. The enhanced image might then be processed to extract features that are of interest to us in a particular application. The features of interest could be gradient magnitude and direction, edge, ridge, and/or valley pixels, or some other information, such as histogram statistics, to name a few. This step is usually referred to as feature extraction. These two steps collectively are called the low-level computer vision task. Once the these low-level features are detected, the next step could be to group them together and label them into more reasonable entities. This process could involve, for example, connecting detected edge pixels to form longer edge segments, by using some criteria. This step is referred to as grouping and labeling or transformation to higher level entities. This step is sometimes called mid-level computer vision task. Finally, these features are input to the high-level computer vision task of object recognition to detect object(s) of interest. As one can clearly see, that output of one step forms the input to the next. Consequently, any errors occurring in one step get propagated to the subsequent step and the output of the high-level vision task may not produce desired results. In recent years, increased attention has been paid to the development of algorithms for performing mid- and high-level vision tasks. Researchers are concentrating less on the low-level task of noise estimation and enhancement. For example, recent works on boundary detection and three dimensional reconstruction of the organs of the body using deformable models and templates [1]– [6] concentrated on finding the object of interest without heed to the accuracy of the results under the chosen noise model. They perform Gaussian smoothing of images assuming white noise with an assumed variance (covariance matrix ). That is, an assumption of equal variance for all the data samples with no correlation among them. They did not even estimate this variance from data. Others went on to estimate the noise variance by techniques such as least squares fitting of the observed image data. Use of white noise assumption does not reflect the true correlation between samples that are observed in the real world. Therefore, this modeling error carries on to the higher level computer vision tasks such as feature extraction, perceptual grouping, and final scene reconstruction, thereby producing suboptimal results. We believe that, by properly modeling the correlation of the samples, it is possible to reduce the modeling errors, which then propagate through the various steps of computer vision algorithms to produce optimal results. Our belief is that a computer vision system without a proper modeling of the noise and the estimation of parameters of these models at each stage will not produce optimal and stable results.