Document Image Analysis and Recognition


  Simone Marinai 

 Outline
 Audience
 Program
 Speaker

Outline

Document image analysis and recognition (DIAR) is a research field that has its roots in the first Optical Character Recognition (OCR) systems, applied for reading numeric check codes. Nowadays, the technology related to DIAR is used in a broad range of applications, where some information has to be extracted from structured documents existing in different media. Typical applications include, among the others, handwritten character recognition, processing of textual web images, and information extraction from digital libraries. In the digital library community a lot of efforts have been devoted to the digitization of paper collections in order to archive them as document image collections. Large digital archives hare currently available, however their full fruition can be achieved only by accessing the information that is embedded in the digital image. The simple application of Optical Character Recognition (OCR) packages can only partially solve these problems, both for the difficulty of obtaining clean converted text and for the lack of structural description of the document. To tackle this problems either layout analysis methods or document image retrieval approaches can be considered. This tutorial will provide a first introduction to most important tasks in DIAR, from low level document image processing to high level applications (see the tutorial summary). Some applications in the digital library field will be described with more details. The tutorial is supported with slides distributed to participants and with an extensive bibliographic reference. In addition, when appropriate, commercial products and publicly available software for dealing with described tasks will be discussed.

Audience


This introductory tutorial is addressed to researchers and students, as well as to technical people, interested in an introduction to problems, solutions and research directions in this field. System integrators can appreciate the discussion of features of commercial products used for document image processing and OCR, whereas researchers and students can be attracted by pointers to the status of the art in the research related to the common aspects of DIAR and digital library applications. A general background in computer science is required, and most basic concepts of document imaging will be provided in the first part of the tutorial.

Program

Schedule: September 15th (Sunday) 9:00 - 12:30


Scanning and storage


Image pre-processing


Layout analysis


OCR and handwriting recognition


Document image retrieval


Digital library applications
 

Speaker:

 Simone Marinai

Simone Marinai received the Laurea in Electronic Engineering in 1992, from the University of Florence, Italy. He obtained the PhD degree in computer science in 1996 with a thesis on the extraction of information from structured documents. In 1995 he has been a visiting scientist at Cenparmi lab (Concordia University - Montreal Canada). His main research interests are in pattern recognition, neural networks, and document processing applications. Currently he is Assistant Professor at University of Florence, where he teaches, among the others, DIAR methods in the Artificial Intelligence course. In 2001 he was co-author of the tutorial `` Artificial Neural Networks for Document Analysis and Recognition'' that was organized in conjunction with the Int. Conference on Document Analysis and Recognition (Seattle, USA). The same tutorial will be organized in conjunction with next ICPR (Quebec city, Canada, Aug. 2002). Simone Marinai is the technical representative of DSI in METAe (`` The Metadata Engine Project'', an EU-founded project), that is aimed at the automatic extraction of structural meta-data from scanned documents belonging to digital libraries. He was the chairman of the workshop `` Document Analysis and Understanding for Document Databases'' (DAUDD) held in 1999 in conjunction with the DEXA conference. He is member of several conference program committees, and he is currently Associate Editor of the `` Electronic Letters on Computer Vision and Image Analysis'' journal.
 

go to DANTE home page

ECDL 2002 Home page
 

Simone Marinai  -- Jan 10 2001
service provided by  http://www.digits.com/