A new step in OCR: Google’s Answer
Posted by decipherinfosys on November 1, 2008
Read this on the Google post today about their OCR (Optical Character Recognition) technology using which their search engine can now read any scanned documents that are scanned and saved in Adobe’s PDF format. So, the scanned images of the words and pictures can now be indexed and made searchable. The link contains the explanation of how searching within indexed documents is different so we won’t go into that. It does require a lot of processing power since scanned documents do not contain any text data that spiders can index.
We use our own OCR parser for our DVMS (Decipher Vaccine Management System) and Business Intelligence Suite for the healthcare product and it is only 75% accurate since hand written text by the doctors/nurses is hardly recognizable to even the human eye 🙂 let alone the machine code. So, we also eagerly look forward to this approach and if that can be used within our product to improve the accuracy of the data, that will be very good.