1 Introduction
The technology of the OCR (Optical Character Recognition) and document image analysis systems today has reached far beyond the simple transformation of textual document images into sequences of characters and words. Instead, the technology has moved onto the development of the correct detection of the structure of the document, as it is the first step in almost every document image understanding task such as information retrieval, document routing and archiving, and perhaps even for the duplicate document detection. In this paper, we present a methodology for formulating and solving the document structure detection problem.