Questions about accuracy of ocr
How to improve the accuracy of the OCR text from Tesseract?
Tesseract API class provides a isValidWord Method to check if the string is a valid word. You can use this to check the recognized characters. This will increase the accuracy of the output.
Inaccurate results might be due to the text size, check this out. It says "Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi."
Further, not being able to detect more than 4 words depends on a lot of factors, what kind (with how many features) of test image, the size of the image, platform etc.
Most accurate open-source OCR for Japanese?
nhocr IS the most accurate open-source OCR for Japanese
How to improve accuracy - existing libraries for removing non-text 'furniture', shapes, etc to avoid confusing OCR?
There's probably not going to be a free off the shelf solution for this, but coding your own shouldn't be too hard since it's probably safe to assume that a rectangle will never be a valid character in your font's alphabet and can therefore be removed safely. It also helps that all your rectangle borders are exactly one pixel wide.
So search for a contiguous horizontal line that is joined to another, parallel line of the same length by exactly two vertical lines. Repeat the search until you find all the rectangles in the image then render them all transparent with Graphics.DrawRectangle and Pens.Transparent. Don't render a rectangle transparent until you've finished searching else you risk wiping out parts of overlapped rectangles before you've found them.
Correct the OCR errors as accurately as possible
Fuzzy Lookup is part of SSIS(Intergration services) that is also part of the SQL Server BI platform. SSAS has nothing to do with that except that you can run a data mining model in a SSAS project. Text data mining in SQL Server 2005 and later are implemented in SSIS.
How accurate are OCR Scanners?
It depends a lot on the OCR software you use as well as the document being scanned. Some OCR software is better than others, but they all have problems if the document isn't perfect & clean text without any extraneous marks within or around the text & the text is sharp & crisp.
If you use OCR on a clean, laser printed paper with standard font text, it will be near perfect if not perfect, but if you tried it on newspaper print then the results would be quite disappointing.