How to scan documents: Difference between revisions
Brian Wilson (talk | contribs) mNo edit summary |
Brian Wilson (talk | contribs) mNo edit summary |
||
Line 12: | Line 12: | ||
== Compress == | == Compress == | ||
I am trying out the djvu format, it seems like a good way to manage the scanned pages. Compression is very good. | I am trying out the djvu format, it seems like a good way to manage the scanned pages. Compression is very good. See http://djvu.org/ | ||
# Convert the scanned PDF documents to DJVU documents, 1 per page. ''mkdir f && pdf2djvu -i f frontpages.pdf'' | # Convert the scanned PDF documents to DJVU documents, 1 per page. ''mkdir f && pdf2djvu -i f frontpages.pdf'' |
Revision as of 03:26, 5 December 2010
Scan
I have access to a Brother scanner that has a document feed on it. It scans a multipage doc and puts the output into a PDF file on my server via FTP.
- Scan the odd pages, front to back, resulting in a single PDF file.
- Scan the even pages, back to front, resulting in a second PDF file.
- If you have it, you can use Adobe Acrobat to collate the files into a single PDF.
I scan at 300 DPI grey scale, with the scanner doing conversion to PDF and upload to a server via FTP. 300 is probably overkill for text but line art looks good. I note also that for the manuals I just scanned I can actually see ghosts of what's printed on the other side of two sided pages.
Compress
I am trying out the djvu format, it seems like a good way to manage the scanned pages. Compression is very good. See http://djvu.org/
- Convert the scanned PDF documents to DJVU documents, 1 per page. mkdir f && pdf2djvu -i f frontpages.pdf
- Optionally perform any additional processing on the individual pages, such as image filtering or contrast enhancement.
- Perform OCR on the individual page files so they can be searched separately
- Merge the page files into one DJVU bundle, making sure they get into the right order. (QA!)
Notes
Packages under Ubuntu are poppler-utils, psutils. Installing the gscan2pdf package pulled in sundry and various useful things such as tesseract and djvu2pdf.
On the Mac I use the viewer djview-libre which is also available for Linux and Windows.