How to Make a Scanned PDF Searchable With Free OCR
Scanned PDFs Are Just Pictures
You scanned a stack of documents — contracts, receipts, old records — and now you have a PDF. You try to Ctrl+F for a name or dollar amount. Nothing happens. You try to select text. Can't do it. You try to copy-paste into an email. Nope.
That's because your "PDF" is actually a collection of images wrapped in a PDF container. The scanner took photos of your pages. There's no actual text data in the file — just pixels arranged to look like text.
This is the difference between a text-based PDF (created digitally in Word, Google Docs, or any app) and an image-based PDF (created by a scanner or camera). Text-based PDFs contain real characters you can search, select, and copy. Image-based PDFs contain photographs of characters. Your computer can display them but can't read them.
OCR fixes this.
What OCR Actually Does
OCR stands for Optical Character Recognition. It analyzes the images in your scanned PDF, identifies each character, and creates a hidden text layer behind the original image. The result is a PDF that looks identical to the original but now contains searchable, selectable, copyable text.
Think of it like this: OCR reads your scanned document the same way you do — visually — then writes down what it sees in a machine-readable format. The original scan stays intact as the visual layer. The recognized text sits invisibly behind it.
After OCR processing, you can:
- Search the document with Ctrl+F / Cmd+F
- Select and copy specific text
- Extract text for use in other documents
- Index the file so desktop search tools (Spotlight, Windows Search) can find content inside it
How to Make Your Scanned PDF Searchable
- Open the OCR tool on PDFShift
- Upload your scanned PDF — up to 50 pages, 250MB max (Pro)
- Click Process — the AI reads every page and builds a searchable text layer
- Download your newly searchable PDF
The output file looks exactly the same as the original. Same pages, same images, same layout. But now there's text data embedded in the file, so search and selection work.
What About Free Desktop Options?
There are a few ways to run OCR outside of PDFShift. Each has trade-offs.
Tesseract (Open-Source, Local)
Tesseract is the most widely used open-source OCR engine. Google maintains it. It's powerful but not user-friendly:
- Requires installation via command line (Homebrew on Mac, apt on Linux, installer on Windows)
- Doesn't process PDFs directly — you need to convert pages to images first, run OCR, then reassemble into a PDF
- Accuracy varies with scan quality and font types
- No GUI unless you install a third-party wrapper like gImageReader
If you're comfortable with terminal commands, Tesseract produces solid results for clean scans with standard fonts. For handwriting, low-resolution scans, or unusual layouts, accuracy drops fast.
Adobe Acrobat (Paid, Best Accuracy)
Adobe Acrobat Pro has OCR built in under Scan & OCR > Recognize Text. It handles messy scans better than most tools because it has decades of PDF-specific tuning. But it costs $20+/month and the OCR is buried several menus deep.
Google Drive (Free, Limited)
Upload a scanned PDF to Google Drive, then open it with Google Docs. Google runs OCR automatically and dumps the extracted text into a document. This works for simple, clean scans but destroys your layout entirely — you get raw text in a Google Doc, not a searchable PDF. Good for extracting content, not for keeping the original format.
Preview on Mac
Preview doesn't do OCR. This surprises a lot of people. You can view scanned PDFs in Preview, but there's no built-in way to make them searchable. macOS just doesn't include that capability.
When OCR Accuracy Matters
OCR isn't perfect. It's reading images, and images are noisy. A few things that affect accuracy:
Scan resolution is the biggest factor. 300 DPI is the sweet spot — most scanners default to this. Below 200 DPI, accuracy drops noticeably. Above 400 DPI, you get diminishing returns and much larger files.
Contrast matters more than you'd think. Faded printouts, yellowed paper, or light pencil marks all reduce accuracy. If you're scanning specifically for OCR, use black-and-white or grayscale mode instead of color — it produces cleaner character edges.
Skewed pages confuse OCR engines. If pages went through the scanner crooked, the text recognition suffers. Straighten pages before scanning, or use the Rotate tool to fix orientation issues afterward.
Font type affects results. Standard printed fonts (Arial, Times New Roman, Calibri) get recognized with 99%+ accuracy on clean scans. Decorative fonts, very small text (below 8pt), and handwriting are harder. Most modern OCR handles common handwriting reasonably well, but don't expect perfection on cursive or sloppy notes.
OCR + Other PDF Tools: Common Workflows
Once your scanned PDF is searchable, you'll probably want to do more with it:
- Extract the text into a plain text file for use in Word, email, or a spreadsheet
- Compress the file — scanned PDFs are typically 5-10x larger than digital PDFs because they contain full-page images. A 20-page scan might be 30-50MB. Compression can cut that to under 5MB
- Remove unnecessary pages — scanned batches often include blank pages, cover sheets, or pages you don't need
- Merge multiple scans — combine separate scanning sessions into one document
- Add page numbers — scanned documents almost never have them, and they're useful for reference
Making Old Archives Searchable
If you're sitting on a pile of scanned documents — years of tax returns, old contracts, medical records — making them searchable transforms them from a filing cabinet you can never find anything in to an archive you can actually use.
The process is straightforward: run each PDF through OCR, save the searchable version, and replace the original file. Now when you need to find that one invoice from 2023, you can search for the vendor name or dollar amount across all your files instead of opening each one and scrolling through pages.
For bulk jobs, PDFShift Pro supports files up to 50 pages each. Process them one at a time — it takes seconds per document.
Scanned PDF vs. Born-Digital: How to Tell
Not sure if your PDF is scanned or digital? Quick test:
- Open the PDF in any viewer
- Try to select text by clicking and dragging
- If you can highlight individual words — it's already text-based. No OCR needed
- If clicking selects the entire page as one image — it's a scanned PDF. OCR will help
Another clue: check the file size. A 10-page text-based PDF is typically 50-200 KB. A 10-page scanned PDF is usually 5-30 MB. If the file seems huge for its page count, it's probably scans.
When You Need More
If you're scanning documents daily and need OCR as part of a larger document management workflow — with batch processing, automatic filing, and full editing — Adobe Acrobat Pro handles that well. It's the most complete tool for organizations that live in PDFs.
For occasional OCR on scanned documents, PDFShift's OCR tool handles the job without installing anything. Upload, process, download — your scanned PDF is now searchable.
Ready to try it?
Extract text from scanned PDFs using AI-powered OCR. Convert image-based PDFs to searchable, copyable text.
👁️ OCR PDF — Free Online ToolGet notified about new PDF tools
AI-powered features coming soon — summarize, chat with, and extract data from PDFs.