1 Introduction
Data integration in autonomous web databases has drawn much attention in recent years, as more and more data in the back-end databases becomes accessible via web servers. A mediator provides a unified query interface as a global schema of the underlying databases. Queries on the global schema are then rewritten as queries over autonomous databases through their web interfaces. Current mediator systems [5], [3] return to user only certain answers that exactly satisfy all the user query predicates. Tuples that are otherwise highly relevant for the query will not be retrieved if they have null values on any of the query predicates. For example, in a used car trading application, if a user asks for convertible cars, all the returned answers must have the value “convt” for the attribute body style. Even though all Z4's are convertibles, a BMW Z4 car which has a null value in its body style will not be returned. This is particularly problematic when the data sources have a significant fraction of incomplete tuples, and/or the user requires high recall (consider, for example, a law-enforcement scenario, where a potential crime suspect goes unidentified because of information that is fortuitously missing in the database).