Photographing and scanning receipts
Receipts are generally printed out using low quality printers and paper, which makes them difficult to recognize. Therefore some care is needed when preparing the input images. The intelligent image preprocessing technology will correct most of the image defects before OCR. However, if the condition of the source receipt or the quality of the image is very bad, recognition accuracy will deteriorate and some data may not be extracted.
Please make sure that the receipt fits entirely within the frame. The header and footer of the receipt will typically contain some information about the vendor, which is also used to determine the purchase type. If the header and footer are cut off, the overall result will suffer.
Straighten the receipt out and position the camera parallel to the plane of the receipt, to avoid perspective distortion. Good lighting with no shadows is also important.
The recommended camera requirements are the same as for other documents:
- 5-megapixel sensor
- Flash disable feature
- Manual aperture control or aperture priority mode
- Manual focusing
- An anti-shake system
- Optical zoom
The receipt folds are not straightened out
Before taking a picture, try to smooth out the receipt which had been folded. The edges of the receipt should be as straight as possible.
If the picture is taken with a noticeable perspective angle, part of the receipt may be too blurred or have too small letters to be recognized efficiently. Try to position the camera so that the receipt in the photo is more like a rectangle than a trapezoid.
Avoid crumpling up the receipt you are going to photograph. The creases damage the printing and make for random shadows which will interfere with OCR.
As with the photographs, the original receipt should not be crumpled up or stained, and should fit entirely on the scanning surface. We recommend the following settings for receipt scanning:
- high contrast of the image
- no highlights
- high resolution: at least 300 dpi, but given that the receipts are often printed using small fonts, even higher resolution is often required
- no additional background images or text
An example of acceptable quality (scaled down, the real resolution should be higher):