I. Introduction
Healthcare industry, extensively distributed in the global scope to provide health services for patients, has never faced such a massive amounts of electronic data or experienced such a sharp growth rate of data today. As stated by the Institute for Health Technology Transformation (iHT, U.S. health care data alone reached 150 exabytes ( bytes) in 2011 and would soon reach zettabyte ( bytes) scale and even yottabytes ( bytes) in the future [1]. However, if no appropriate technique is developed to find great potential economic values from big healthcare data, these data might not only become meaningless but also require a large amount of space to store and manage. Over the past two decades, the miraculous evolution of data mining technique has imposed a major impact on the revolution of human's lifestyle by predicting behaviors and future trends on everything, which can convert stored data into meaningful information. These techniques are well suitable for providing decision support in the healthcare setting. To speed up the diagnosis time and improve the diagnosis accuracy, a new system in healthcare industry should be workable to provide a much cheaper and faster way for diagnosis. Clinical decision support system (CDSS), with various data mining techniques being applied to assist physicians in diagnosing patient diseases with similar symptoms, has received a great attention recently [2]– [4]. Naïve Bayesian classifier, one of the popular machine learning tools, has been widely used recently to predict various diseases in CDSS [27]. Despite its simplicity, it is more appropriate for medical diagnosis in healthcare than some sophisticated techniques [6], [7].