March 1, 2026 8 min read

    PDF OCR Accuracy Guide: DPI, PSM, OEM and Language Tuning

    Get better OCR output quality for invoices, receipts, and scanned forms.

    1. DPI matters most

    For low-quality scans, increase dpi (200–300 range usually helps). Higher DPI improves recognition but increases processing time.

    2. Tune PSM to document layout

    • Single block invoices: lower segmentation complexity.
    • Multi-column reports: use broader layout parsing.

    3. Set correct language packs

    Use lang matching your document language(s), e.g. eng or eng+hin, to reduce substitution errors.

    curl -X POST https://pdfapihub.com/api/v1/pdf/ocr/parse \
      -H "CLIENT-API-KEY: your_api_key_here" \
      -H "Content-Type: application/json" \
      -d '{
        "url": "https://example.com/scanned.pdf",
        "pages": "1-3",
        "lang": "eng",
        "dpi": 240,
        "psm": 3,
        "oem": 3
      }'

    4. Run post-processing

    Normalize whitespace, fix line breaks, and validate known patterns (invoice numbers, totals, dates).

    Conclusion

    DPI + language + segmentation strategy delivers most OCR gains. Iterate on a small golden dataset before scaling. Start from PDF OCR API.