What DPI should I use for OCR quality?

A DPI around 220 is a strong default for scanned text, balancing accuracy and processing cost.

Can I OCR only selected pages?

Yes. Use the pages field with values like all, 1-3, or 1,3,5-7.

When should I tune PSM and OEM?

Tune PSM for page layout type and OEM for engine behavior when default extraction quality is not sufficient.

PDF OCR API for Developers

OCR

PDF OCR API built for teams that need reliable document automation at scale. Convert and process files with simple REST requests, predictable output quality, and production-grade uptime. Use it for invoice extraction, archive digitization, and search indexing pipelines. Includes clear docs, SDK-ready endpoints, and quick testing in your browser.

What it does

OCR parse PDFs using page range selection.

Configure lang, dpi, psm, oem for accuracy/performance trade-offs.

Perfect for invoice parsing, archives, and search indexing pipelines.

Endpoint & Example

POST /v1/pdf/ocr/parse

url / file

required

Input scanned PDF via public URL or multipart file upload.

pages

optional

Page selection string like all, 1-3, or 1,3,5-7.

lang, dpi, psm, oem

optional

OCR language and Tesseract engine settings for quality and speed control.

curl -X POST https://pdfapihub.com/api/v1/pdf/ocr/parse \
  -H "CLIENT-API-KEY: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/scanned.pdf",
    "pages": "1-3",
    "lang": "eng",
    "dpi": 220,
    "psm": 3,
    "oem": 3
  }'

Sandbox

Please sign in or sign up to use the sandbox.

Run OCR on your first document

Get an API key and validate OCR quality in playground before wiring your workflow.

Related resources

API Docs Pricing Image OCR API OCR accuracy guide PDF OCR in n8n