Optical Character Recognition - OCRWeb by CVIT @ IIIT-H

Document Analysis and Recognition

The present growth of digitization of documents demands an immediate solution to enable the archived valuable materials searchable and usable by users in order to achieve its objective. Our team has developed robust and efficient solutions to full the objectives.

Learn More

Free Web Framework for OCR

A web framework for optical character recognition on 15 Indic scripts as well as English has been introduced. This web framework can be used by everyone for text recognition free-of-cost. An API has also been introduced to be used as 3rd tool in standalone applications or otherwise.

Learn More

Scene Text Understanding

Recognizing scene text is a challenging problem, more than the recognition of scanned documents. Given the rapid growth of camera-based applications readily available on mobile phones, understanding scene text is more important than ever. Our goal is to fill this gap in understanding the scene.

Learn More

The Optical Character Recognition Tool

Free and Easy to Use.
Supports 12 Indic Scripts and English Languages
Script and Language Identification at Word and Line Level
State-of-the-art Accuracies across languages.
Segmentation Free.
Page layout analysis and correction.
Area selection on images for recognition.
Spell correction and editing options.
Supports all kinds of documents (scanned or otherwise).
Supports low resolution images.
Supports books upload.
OCR API as 3rd party recognition tool

Input File Format

JPEG, JFIF, PNG, GIF, BMP, PBM, PGM, PPM, PCX, TIFF
Multiple images in ZIP archive

Output File Format

Plain Text Document(TXT)
Microsoft Word Compatible/Open Office(DOC/DOCX/ODT)
Adobe Acrobat(PDF)

OCR System supports 12 Indic Scripts and English Languages

Assamese
Bangla
English
Gujarati
Gurumukhi
Hindi
Kannada
Malayalam
Manipuri
Marathi
Odiya
Tamil
Telugu

Figure 1: (A) The architecture of a traditional OCR, which starts with symbol/character extraction and classification. (B) Our approach. We bypass the two harder modules of the traditional OCR. We directly output a Unicode sequence, given a word image.

Check it out!

Try out sample pages from all the languages.

Try Now!

Upload

Get text for your own document images.

Upload

Welcome to Document Analysis and Recognition @ CVIT

Document Analysis and Recognition

Free Web Framework for OCR

Scene Text Understanding

Web Framework for Optical Character Recognition

Check it out!

Upload