Welcome To Document Analysis Tools!
A repertoire of tools maintained by IIT Bombay Team that can help you with various document analysis tasks, mainly focused on indic languages.
Explore

Our Projects

Indic Document Digitization
Previous research in Document Analysis and OCR has focused on English text, and Indian languages have mainly been ignored. Our purpose is to create end-to-end document analysis tools for Indic language documents, called Indic Document Analysis, which can enable effective and accurate OCR and digitization.
Leap OCR
A key objective of the project is to develop a user-friendly interface through which the user will be able to convert scanned images and pdf files into machine-coded text, which can be further refined to recognize more complex characters and better detect languages. Additionally, this tool can be used to post-edit documents and make them more effective and engaging.
OCR NER Extractor
Named Entity Recognition (NER) is a method of information extraction that is used to acquire important information from unstructured text documents. In this project, our aim is to develop a NER extractor for scanned medical documents using NLP techniques.

Professors, Students & Interns

...

Prof Ganesh Ramakrishnan

Principal Investigator

...

Prof Parag Chaudhuri

Principal Investigator

...

Dr Venkatapathy Subramanian

Senior Project Research Scientist

...

Badri Vishal Kasuba

Masters Student, IIT Bombay

...

Dhruv Kudale

Masters Student, IIT Bombay

Interns
Sagarika Raje
Pooja Aryamane
Dishant Padalia
Sujay Torvi
Aditya Motwani
Saurabh Baghel
Raja Harsh Vardhan Singh
Shivani Shenai
Diksha Rani
Om Surve
Gauri Sharma

Check out our tutorials here!

11 Vision Future of DECILE

Published on May 19, 2022

Watch Video

See more videos

Research Publications