Deep dive May 22, 2026 · 5 min read

PaddleOCR in the browser — running 60 MB ML models without uploads

PaddleOCR is one of the most accurate OCR engines available under a permissive Apache 2.0 license. Loft runs it entirely in the browser, downloading the model weights once to your device. Here is how that works, what the engineering tradeoffs are, and why it is the right shape for privacy-sensitive OCR.

By Khine 962 words → OCR PDF Extractable lead

PaddleOCR in the browser — running 60 MB ML models without uploads — hero illustration

When I first tried to run PaddleOCR in a browser tab, it took eighteen seconds to OCR a single page of typewritten English. The page froze for the full eighteen seconds. The browser’s “this tab is using significant memory” warning fired. I gave up on browser-side OCR for about a month and used Tesseract.js instead, which is roughly half the accuracy but at least usable.

This post is the story of how we got back to PaddleOCR and made it work — and what was actually slow the first time.

The starting state

PaddleOCR is a Chinese-origin OCR engine, released under the Apache 2.0 license, that benchmarks at or near the top for text recognition accuracy on English, CJK languages, and most Latin-script European languages. Mature, well-maintained, used in production by enterprises that have GPUs available.

The “in production” qualifier is the important one. PaddleOCR was designed for server-side inference on PaddlePaddle’s native runtime, ideally with CUDA. Running it in a browser tab on a phone’s CPU is asking the model to operate under conditions nobody designed it for.

That’s why my first attempt was eighteen seconds.

What was actually slow

Profiling the slow path revealed three separable problems:

The model files were in PaddlePaddle’s native format. Loading them in a JS runtime required a heavy compatibility shim that itself was slow.

The detection model — the part that finds text regions on the page — was the highest-quality version, which was also the largest version (around 95 MB). It downloaded slowly on first run.

The recognition model — the part that reads text in each detected region — ran in WASM via ONNX Runtime Web on single-threaded mode. Multi-threaded WASM (via SharedArrayBuffer) was either unsupported by the page’s headers or not enabled.

Each problem was solvable individually. Together they were producing the eighteen-second outcome.

What we did about it

Three fixes, in order:

Convert to ONNX format up front. PaddleOCR ships native PaddlePaddle weights; ONNX Runtime Web reads ONNX weights. The official Paddle-to-ONNX conversion tool was the bridge. We moved the conversion out of runtime entirely — the ppu-paddle-ocr package ships pre-converted ONNX weights from its build pipeline.

Quantise to INT8. Full-precision (FP32) ONNX weights for the detection + recognition pair were large. INT8 quantization brought each per-language set down to roughly 10–15 MB at the cost of about 2-3% accuracy on benchmark sets. The accuracy loss is real but typically invisible for routine office documents; the file-size win is huge.

Enable cross-origin isolation. Loft’s /tools/* pages serve Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: credentialless. That unlocks SharedArrayBuffer, which ONNX Runtime Web uses for multi-threaded WASM. Inference time dropped by a factor of three on devices with multiple cores.

After all three: roughly three seconds for a single page on a modern desktop, six on a phone. Acceptable.

The architecture now

Three pieces, executing as a pipeline:

The detection model identifies bounding boxes around text regions on the input image. Output: list of polygons.

For each polygon, the image gets cropped and passed to the recognition model, which produces the text content. Output: list of (polygon, text) pairs.

The post-processing step assembles the output into a structured representation — usually plain text in reading order, sometimes with positional metadata for tools that need it (e.g. the OCR PDF tool, which embeds extracted text back into the PDF at the correct coordinates).

All three pieces run inside a Web Worker so the main thread stays responsive while OCR runs.

What I underestimated

The first-visit model download is the single biggest UX hurdle. 10–15 MB per language on a 4G connection takes a noticeable pause; on a slower connection it takes longer. We mitigate by lazy-loading the model only when the OCR tool is opened (not on first visit to any Loft page) and by showing a clear progress indicator during download. We also offer Tesseract.js as a fallback for users who don’t want the PaddleOCR weight.

The mitigation works, but there’s no escaping that “before the tool works, your phone needs to download the model weights” is a worse first-run experience than “this tool just works.” Native OCR apps don’t have this problem because the user downloads them upfront via the app store.

What I’d do differently

Two things, with the benefit of hindsight:

Ship the Tesseract fallback first. The lower-accuracy fallback is more than good enough for most users’ actual documents. Shipping it as the default and letting users opt into PaddleOCR for higher accuracy would have given us a better day-one experience and a reason to download the heavy model only when the user asked for the upgrade.

Cache the model more aggressively. The current caching is service-worker-driven and works, but I’ve heard of (and seen, in my own browsing data) cases where the cache gets evicted on iOS Safari after a few weeks of disuse. The next iteration should treat the model as a “persistent” storage class via StorageManager.persist().

What’s still hard

Handwritten text. Loft’s PaddleOCR setup is excellent for printed text in supported languages; it’s meaningfully behind cloud-managed OCR on hand-written documents, particularly cursive English and free-form notes. The training data gap is real and we don’t have a path to close it without a fundamentally different model.

Rare scripts. The general-purpose PaddleOCR weights handle common Latin and CJK scripts. For Arabic, Hebrew, Thai, or Devanagari we fall back to Tesseract. Same gap.

Very large documents on phones. Each page is its own inference cycle, and on a memory-constrained device the cumulative state can hit the tab ceiling before the document finishes. Mitigation: process serially with explicit cleanup, warn the user before starting.

PaddleOCR’s repo: github.com/PaddlePaddle/PaddleOCR. ONNX Runtime Web docs: onnxruntime.ai/docs/tutorials/web/. The pillar at /docs/how-it-works/ covers the ML stack briefly in §4.

References

PaddleOCR — PaddlePaddle on GitHub — PaddlePaddle (accessed 2026-05-27)
ONNX Runtime Web — Microsoft — Microsoft (accessed 2026-05-27)

PDF & Documents

Image Tools

Dev Tools

Audio and Video

Cooking & Kitchen

Security & Privacy

Network & Sysadmin

Finance Tools

Open Anything

PaddleOCR in the browser — running 60 MB ML models without uploads

The starting state

What was actually slow

What we did about it

The architecture now

What I underestimated

What I’d do differently

What’s still hard

References

Keyboard shortcuts

PDF Editor

Compare PDF

Gerber Viewer

The starting state

What was actually slow

What we did about it

The architecture now

What I underestimated

What I’d do differently

What’s still hard

References