How to scan documents
From Wildsong
I have a Brother scanner that has a document feed on it. It scans a multipage doc and puts the output into a PDF file on my server via FTP.
I am trying out the djvu format, it seems like a good way to manage the scanned pages. Compression is very good.
- Scan the odd pages, front to back, resulting in a single PDF file.
- Scan the even pages, back to front, resulting in a second PDF file.
- Convert the 2 PDF documents to DJVU documents, 1 per page. mkdir f && pdf2djvu -i f frontpages.pdf
- Optionally perform any additional processing on the individual pages, such as image filtering or contrast enhancement.
- Perform OCR on the individual page files so they can be searched separately
- Merge the page files into one DJVU bundle, making sure they get into the right order. (QA!)
Notes
Packages under Ubuntu are poppler-utils, psutils. Installing the gscan2pdf package pulled in sundry and various useful things such as tesseract and djvu2pdf.
On the Mac I use the viewer djview-libre which is also available for Linux and Windows.