Printed documents are still being widely used in many industries including medical insurance and healthcare providers. To capture the information locked in the printed documents that are formatted in different ways with tables, pictures, text, handwriting, etc. is still a challenging work before automated processing can be applied. Traditional method to recognize a document layout structures involve pre-defined rules and extensive manual and often error-prone work.
In the session, we will present a novel work to recognize document layout structures and table structures using deep neural network. We employed a three-stage framework for extracting information from a scanned photo-ed document: stage 1—recognize the document layout structure to extract the text bodies, tables, graphs, etc.; Stage 2—for tables we then recognize the table structure and extract the elements from each column and row together with the labels; and Stage 3—we send the extracted element for character-level recognition. We trained a Faster-RCNN with ResNET with few collections of document data sets, pre-tagged or generated with tags, for recognizing document layout structures and for recognizing table structures. The character-level recognition on the extracted text and table elements were done via an OCR engine.