PDF Image + Searchable Text
PDF Image + Searchable Text Conversion: (formerly known as PDF plus hidden text) contains a bitmapped image of the original, and a hidden layer of searchable text. The conversion process involves: scanning the hardcopy original, performing OCR (Optical Character Recognition) to capture the text of the document, and distilling the two layers into a PDF searchable image file. Though text can be searched, hyperlinks and bookmarks are not fully functional in this format. As with PDF image only, PDF searchable image files are only as legible as the original. And PDF searchable image files have the largest file size of the three types - this can be a big issue if the PDF document is bound for the Internet.
Pages will be displayed as image resulting in accuracy which is inherently high based on image displayed.
Text resulting from an OCR (Optical Character Recognition) process may be “bonded” to the originating image to create a PDF/Searchable Image file. When you search for words or phrases, they will be highlighted in the image.
This background text allows searchability, but the accuracy is dependent on the quality of your originals and other factors. Based on this background text, you have two options:
PDF Image + Text (Raw or uncorrected OCR text)
PDF Image + Text (Corrected or proof-read)
For many applications, the raw conversion with uncorrected text is accurate enough. For clients needing higher accuracy rates, SunTec will correct and proofread the OCR output. This process is often vital for documents containing italicized characters and small text, or for poor-quality original documents.
PDF/Searchable Image files may be indexed for full-text retrieval by any search engine capable of indexing PDF files.
Typical applications include: -
- Business records
- Academic journals
- Advertising and promotional materials
- Historical materials and
- Handwritten materials including color or grayscale images.
PDF/Searchable Image is used globally by governments and businesses for electronic storage and retrieval of:-
- Business records
- CD-ROM publishing
- Electronic Publishing
- Manufacturing and design documentation
- On-line content / Intranet content
- Records Retention / Legacy Data Conversion
- Delivery Challans, Shipping notes, and Invoices
PDF File Type Comparison
| Image | Image + Searchable Text | PDF Normal (Formatted Text & Graphics) | |
| Accuracy | Very high (Page is retained as image) |
Very high (Page is retained as image) |
High (in effect, re-authoring the document) |
| Text searchability | No | Yes | Yes |
| File size | Large (Typically, 40-50 KB at 300 dpi without grayscale or color images) |
Large (Typically, 50-60 KB at 300 dpi without grayscale or color images) |
Small size (Typically, 4–6 kb per page for simple documents) |
| Typical Application | Budget friendly archiving | Full-text search for bitonal files | Tiny but rich files - great for the web |
| Cost | Low | Medium | High |
- PDF Conversion Services
- PDF Image
- PDF Image+Searchable Text
- PDF Multi-Resolution or Enhanced Image
- PDF Formatted Text & Graphics
- Digitization / Digital Content Conversion Services
- HTML / XHTML Conversion Services
- XML Conversion Services
- Adobe PDF Conversion Services
- e-Book Publishing / e-Book Conversion
- Text Conversion / Word Processing Services
- Document Scanning, OCR and Proofreading Services
- Typing from Digital Dictations / Transcription Services
- Image / Photo Editing Services
- Bandwidth
- Text Searchability
- Color or half-tone images
- Document size
SunTec Web Services Pvt. Ltd.
Floor 3, Vardhman Times Plaza
Plot 13, DDA Community Center
Road 44, Pitampura
New Delhi - 110 034, INDIA
Phone : +91 11 4264 4425
+91 11 4264 4426
+91 11 4264 4427
+91 11 4264 4428
+91 11 4264 4429
Fax(India): +91 11 4264 4430
Fax (US) : +1 646 365 3077
Email : info@data-entry-india.com

Data Entry & Data Capture
Catalog Processing
Web Research / Data Mining
Digital Content Conversion

