How to scan documents

From Wildsong
Revision as of 03:02, 8 November 2009 by Brian Wilson (talk | contribs)
Jump to navigationJump to search

I have a Brother scanner that has a document feed on it. It scans a multipage doc and puts the output into a PDF file on my server via FTP.

I am trying out the djvu format, it seems like a good way to manage the scanned pages. Compression is very good.

  1. Scan the odd pages, front to back, resulting in a single PDF file.
  2. Scan the even pages, back to front, resulting in a second PDF file.
  3. Convert the 2 PDF documents to DJVU documents, 1 per page. mkdir f && pdf2djvu -i f frontpages.pdf
  4. Optionally perform any additional processing on the individual pages, such as image filtering or contrast enhancement.
  5. Perform OCR on the individual page files so they can be searched separately
  6. Merge the page files into one DJVU bundle, making sure they get into the right order. (QA!)

Notes

Packages under Ubuntu are poppler-utils, psutils. Installing the gscan2pdf package pulled in sundry and various useful things such as tesseract and djvu2pdf.

On the Mac I use the viewer djview-libre which is also available for Linux and Windows.