I. Introduction
The clinicians and researchers at the Houston Methodist Hospital (HMH) system use multiple data sources to acquire the data for research and quality improvement purposes as no single infrastructure or database exists that could provide them with ease the data required for their research. The HMH system is a home to seven hospitals and operates approximately nine major categories of clinical databases. Current methods of obtaining data from all these locations and vendors for preparatory-to-research questions often involves laborious time-consuming manual extracts and cleansing of data for specific projects. It is recognized that the current process is cumbersome, costly, and time consuming and adds no intrinsic value to the research being undertaken. This leads investigators to spend a lot of unproductive time in negotiating and waiting for data instead of conducting the research. Worse, the data ultimately delivered often are incomplete, depending on the understanding and knowledge of the person retrieving the data. In many institutions, a “gray market” for data could develop, as researchers find unofficial workarounds to obtain data they need for their work. This “gray market” approach could lead to compliance and security risks, as isolated silos of patient data evolve in different parts of the healthcare organization without formal oversight for Health Insurance Portability and Accountability Act (HIPAA) and Institutional Review Board (IRB) compliance, and outside of the processes for protecting data from misuse or breach. HMH researchers need access to vast pools of patient data to develop and test their scientific hypotheses, so the making of a solitary integrated data system would provide a huge opportunity for an expanded number of biomedical research projects, including large-scale projects.