1. Introduction
The proliferation of a wide variety of sensors (video cameras, microphones, infra-red badges, RFID tags, etc.) in public places such as airports, train stations, streets, parking lots, hospitals, governmental buildings, shopping malls, and homes has created the opportunities for development of security and business applications. Surveillance for threat detection, monitoring sensitive areas to detect prohibited or unusual events, tracking customers in airports and in retail stores, monitoring movements of assets, and monitoring elderly and sick people at home are examples of some applications that require the ability to automatically detect, recognize and track people and other objects by analyzing multiple streams of often noisy and poorly synchronized sensory data. A scalable system built for this class of tasks should also be able to integrate this sensory data with contextual information and domain knowledge provided by both the humans as well as the physical environment to maintain a coherent picture of the world over time. While video surveillance has been in use for several decades, systems that can automatically detect and track people (or objects) using multiple streams of heterogeneous and noisy sensory data is still a great challenge and an active research area. Since the performance of these systems is not at the level at which they can work autonomously, there are human experts who are still part of the loop. Many approaches have been proposed for object tracking in recent years. They differ in various aspects such as number of cameras used (single [1], two [2] or more [3]–[5] cameras), type of cameras and their speed and resolution, type of environment (indoors or outdoors), area covered (a room or a hall, a hallway, several connected rooms, a parking lot, a highway, etc.), and location of cameras (with or without overlapping fields of view), using different approaches to background modeling, object modeling (2D or 3D representations, color and/or shape models), and different inference techniques. However, the performance of most systems is still far from what is required for real-world applications.