How to Integrate ABBYY Cloud OCR SDK into Your App — Step-by-Step
This guide shows a minimal, practical integration path using ABBYY Cloud OCR SDK (v2 where possible). It assumes you need server-side processing (recommended) and uses REST calls; sample code snippets use curl and Node.js (axios). Adjust for your language of choice.
Prerequisites
- ABBYY Cloud OCR SDK account — create an Application to get ApplicationID and ApplicationPassword.
- Decide processing location (region) shown in your ABBYY dashboard — you’ll use
.ocrsdk.com. - Files to send (images or PDFs).
- Basic HTTP client (curl, axios, fetch, requests).
High-level flow
- Authenticate requests with Basic auth using ApplicationID:ApplicationPassword (Base64).
- Submit an image/document for processing (processImage or submitImage + processDocument).
- Poll task status until Completed.
- Download result from the resultUrl returned in the completed task.
- Handle errors, retries, and rate limits.
Step 1 — Create and secure credentials
- Register an Application in ABBYY Cloud OCR SDK portal. Save ApplicationID and ApplicationPassword.
- Store credentials securely (server-side secrets vault / environment variables). Do not embed in client-side code.
Step 2 — Choose processing method & parameters
- For single images: processImage (starts processing immediately).
- For multi-page documents: submitImage (upload pages) then processDocument.
- For business cards, small fields, barcodes: use dedicated methods (processBusinessCard, processTextField, etc.).
- Common parameters:
- language (e.g., English, Russian)
- exportFormat (txt, pdfSearchable, docx, xlsx)
- profile (documentConversion, textExtraction, etc.)
- description (optional)
Example query parameters: language=English&exportFormat=pdfSearchable&profile=documentConversion
Step 3 — Send the request (example: processImage)
Endpoint: POST http://
curl example:
bash
curl -u “ApplicationID:ApplicationPassword” -F “file=@/path/to/image.jpg” “http://.ocrsdk.com/processImage?language=English&exportFormat=pdfSearchable”
Response: XML/JSON task object containing task id and status=“Queued” or “InProgress”.
Node.js (axios + form-data) example:
js
const FormData = require(‘form-data’); const axios = require(‘axios’); const fs = require(‘fs’); const form = new FormData(); form.append(‘file’, fs.createReadStream(’./image.jpg’)); const url = ‘http://.ocrsdk.com/processImage?language=English&exportFormat=pdfSearchable’ ; const auth = { username: process.env.ABBYY_ID, password: process.env.ABBYY_PW }; axios.post(url, form, { auth, headers: form.getHeaders() }) .then(res => console.log(res.data)) // contains task id .catch(err => console.error(err.response?.data || err.message));
Step 4 — Poll task status
Use getTaskStatus with the returned taskId. Poll every 2–3 seconds (do not exceed ABBYY recommendations).
GET http://
Example loop (pseudo):
- If status = Queued or InProgress → wait 2–3s and poll again.
- If status = Completed → read resultUrl attribute.
- If status = ProcessingFailed or NotEnoughCredits → handle error.
Important: task status updates ~every 2–3 seconds; avoid aggressive polling.
Step 5 — Download result
The task’s resultUrl points to a publicly downloadable blob (no auth). Download and store it.
curl example:
bash
curl -o result.pdf “https://.blob.core.windows.net/files/xxxxx.result”
Step 6 — Parse and use output
- Output formats:
- Plain text (TXT), searchable PDF (pdfSearchable), DOCX, XLSX, PPTX, ALTO, XML, CSV, vCard.
- If you requested structured XML/ALTO, parse for zones, coordinates, and confidence.
- For downstream automation, convert DOCX/XLSX to your app’s storage format or extract data fields.
Error handling & best practices
- Use server-side integration to keep credentials secret.
- Respect rate limits and credits — handle NotEnoughCredits gracefully.
- Retry transient network errors with exponential backoff.
- Validate input images (resolution, orientation). Preprocess (deskew, crop) if needed to improve accuracy.
- Monitor task statuses and log taskIds for traceability.
- For multi-page PDFs, consider submitImage + processDocument to build a single output file.
- Use secure (HTTPS) endpoints for production where supported.
Quick checklist for production rollout
- Store ApplicationID/ApplicationPassword in secrets manager.
- Implement server-side request/response flow and polling with backoff.
- Validate and sanitize uploaded files.
- Implement retries and error categorization (transient vs permanent).
- Implement logging/metrics for tasks, errors, and credit usage.
- Provide user feedback (processing progress) and result download links.
Useful references
- ABBYY Cloud OCR SDK API docs and quick-start guides (code samples for multiple languages) — consult ABBYY support/docs for the exact endpoint names, parameter lists, and updated examples for your API version and processing location.
Leave a Reply