# Extract Text from PDF

> Pulls the embedded text layer out of a PDF via PDF.js, preserving reading order for tagged PDFs 1.4 and newer; older files fall back to text-run position, which can scramble multi-column layouts. Output copies to clipboard or downloads as TXT. The source PDF is read-only and never leaves the tab — usable on contracts or legal discovery without third-party exposure.


Live tool: https://lofttools.com/tools/pdf-tools/extract-pdf-text

Category: PDF & Documents

## How it works

1. **Select PDF** — Drop the PDF you want text out of, or click to choose a file
2. **Extract text** — Text is automatically extracted from all pages
3. **Copy or download** — Copy extracted text to clipboard or download as a TXT file

## FAQ

### Can it extract text from scanned PDFs?

For scanned PDFs, use our PDF OCR tool which uses optical character recognition to read text from images.

### Does this preserve bookmarks and form fields?

The PDF is read-only here — text is extracted, the source file and its structure stay exactly as you opened them.

### What PDF versions are supported?

Text extraction works on PDF 1.0 through 2.0. Tagged PDFs (1.4+) extract with proper reading order; older files use the position of each text run on the page to determine order.

## Tips

- **PDF parsed entirely in your browser** — PDF.js reads the file locally; nothing is uploaded. Contracts, medical records, or legal discovery PDFs can be turned into plain text without third-party exposure.
- **Tagged PDFs preserve reading order** — PDF 1.4+ files with structure tags extract in logical reading order. Untagged files fall back to text-run position, which can scramble multi-column layouts.
- **Scanned PDFs need OCR first** — This tool reads the embedded text layer. If your PDF is a scan (pure images), use OCR PDF to add a text layer, then come back here.

## Privacy — what we do not do

This tool runs entirely in the browser via WebAssembly. Your file never reaches a Loft Tools server. Specifically:

- **No upload.** The file bytes load into the browser tab's memory and process on your own CPU. Open DevTools → Network and observe zero outbound requests carrying file data while Extract Text from PDF runs.
- **No AI training on your file.** Loft does not train models. We could not train on a file we cannot see.
- **No content scanning.** No virus, copyright, or content-moderation pass against your file. The bytes are not accessible to us.
- **No server-side log of file contents, filenames, or EXIF metadata.** Cloudflare edge captures URL and truncated IP for abuse defense (standard CDN behaviour). Cloudflare Web Analytics records anonymous page hits, no cookies, no PII. Nothing about your file content reaches any log.
- **No retention.** Close the tab and the file leaves browser memory. No backups exist on our side because no copy ever existed on our side.
- **No account.** No email, no signup, no auth, no telemetry tied to you.
- **Offline-capable after first visit** (PWA). Once you've loaded a tool, it caches; later sessions work without internet. For high-sensitivity files, run the tool once online to warm the cache, then disconnect before processing.

Compare with upload-based services: each transmits your file to a processing server. Even over HTTPS, each has logs, retention windows, and subpoena exposure. Loft has none of these because the server architecture does not include your file.

## More

- All tools: https://lofttools.com/tools
- Category: https://lofttools.com/tools/pdf-tools
- LLM index: https://lofttools.com/llms.txt