Abstract:
Document image understanding encompasses the technology required to make paper documents equivalent to other computer exchange media like floppies, tapes, and CDROMs. The...Show MoreMetadata
First Page of the Article
Abstract:
Document image understanding encompasses the technology required to make paper documents equivalent to other computer exchange media like floppies, tapes, and CDROMs. The physical reader of the paper document is the scanner just like the physical reader of the floppy is the floppy drive and the physical reader of the tape cartridge is the tape cartridge drive, and the physical reader of the CDROM is the CDROM drive. In the survey presented, we restrict ourselves to documents such as business letters, forms, and scientific and technical articles such as those found in archival journals and technical conferences. Understanding such documents involves estimating the rotation skew of each document page, determining the geometric page layout, labeling blocks as text or non-text, determining the read order for text blocks, recognizing the text of text blocks through an OCR system, determining the logical page layout, and formatting the data and information of the document in a suitable way for use by a word processing system or by an information retrieval system.<>
Date of Conference: 21-23 June 1994
Date Added to IEEE Xplore: 06 August 2002
Print ISBN:0-8186-5825-8
Print ISSN: 1063-6919
First Page of the Article