I. Introduction
In General, image processing applications with the goal of outputting images for human consumption can benefit from models of perceived image quality. Often, such models are created by machine learning, which typically requires training sets of images with annotations of perceived quality. To this end, human subjects are presented with images and asked to rate their visual quality, usually on an absolute category rating (ACR) scale, i.e., they are asked to select a quality from the scale . However, subjects might disagree, and so many ratings are collected for each image. The collected distribution of ratings for an image is summarized by the mean opinion score (MOS), i.e., the average rating [1].