How to scan documents
Scan
I have access to a Brother scanner that has a document feed on it. It scans a multipage doc and puts the output into a PDF file on my server via FTP.
- Scan the odd pages, front to back, resulting in a single PDF file.
- Scan the even pages, back to front, resulting in a second PDF file.
- If you have it, you can use Adobe Acrobat to collate the files into a single PDF.
I scan at 300 DPI grey scale, with the scanner doing conversion to PDF and upload to a server via FTP. 300 is probably overkill for text but line art looks good. I note also that for the manuals I just scanned I can actually see ghosts of what's printed on the other side of two sided pages.
Compress
I am trying out the djvu format, it seems like a good way to manage the scanned pages. Compression is very good. See http://djvu.org/
- Convert the scanned PDF documents to DJVU documents, 1 per page. mkdir f && pdf2djvu -i f frontpages.pdf
- Optionally perform any additional processing on the individual pages, such as image filtering or contrast enhancement.
- Perform OCR on the individual page files so they can be searched separately
- Merge the page files into one DJVU bundle, making sure they get into the right order. (QA!)
Notes
Packages under Ubuntu are poppler-utils, psutils. Installing the gscan2pdf package pulled in sundry and various useful things such as tesseract and djvu2pdf.
On the Mac I use the viewer djview-libre which is also available for Linux and Windows.