Aim : The aim of this project is to develop such a tool which takes an Image as input and extract characters (alphabets, digits, symbols) from it. The Image can be of handwritten document or Printed document. It can be used as a form of data entry from printed records.
Tool : This project is based on Machine learning, We can provide a lot of data set as an Input to the software tool which will be recognized by the machine and similar pattern will be taken out from them. We can use Matlab or Octave as a building tool for this product but Octave is recommended in initial state as its free and easy to use.
Research : A lot of research is going on this product and which is still going on. Research areas include image processing, natural language processing, artificial Intelligence and machine learning.
Implementation : The Implementation of such a tool depends on two factors – Feature extraction and classification algorithm. So you can use various classifiers available online and also read about basic feature extraction algorithm. The basic version of the product(of less accuracy) can be implemented in Octave with limited training data set and simple component analysis. Refer below links for more information about implementation and ongoing research.
There are also online available tool which recognizes character from image and convert them to machine coded characters in form of doc or txt formate – http://www.onlineocr.net/
The field of such tools is too large, you can learn a lot about above technologies by contributing to ongoing projects or creating your own from scratch.
This idea is contributed by Utkarsh Trivedi. If you also wish to showcase your project idea here, please send an email to email@example.com.