Why am I getting a lot of mistakes in my OCRed text?
If you’re new to OCR, you may have come with the idea that OCR is almost perfect, and just makes a few mistakes now and then. No. It’s slightly amazing that OCR works at all, and when it does, it isn’t perfect. You might reasonably expect to average anything up to 10 errors per page for typical PG work; if you’re seeing more, then there is a problem with a) your printed book b) your scan, or c) your OCR package Problems with the printed book fall into three categories: bad printing, age, and unusual fonts. Bad printing consists of problems like too much or too little ink on the press at the time the book was printed, and irregularities in the print where the metal type was damaged. Age causes yellowing–even browning–of the paper, and faded print. Unusual fonts may be hard for OCR to recognize, and very tightly-spaced print may make adjacent letters seem to touch, which confuses OCR software. There are many ways for you to have problems with your scan. Obviously, if your scanner is de