How to scan documents: Difference between revisions

Revision as of 03:02, 8 November 2009

I have a Brother scanner that has a document feed on it. It scans a multipage doc and puts the output into a PDF file on my server via FTP.

I am trying out the djvu format, it seems like a good way to manage the scanned pages. Compression is very good.

Scan the odd pages, front to back, resulting in a single PDF file.
Scan the even pages, back to front, resulting in a second PDF file.
Convert the 2 PDF documents to DJVU documents, 1 per page. mkdir f && pdf2djvu -i f frontpages.pdf
Optionally perform any additional processing on the individual pages, such as image filtering or contrast enhancement.
Perform OCR on the individual page files so they can be searched separately
Merge the page files into one DJVU bundle, making sure they get into the right order. (QA!)

Notes

Packages under Ubuntu are poppler-utils, psutils. Installing the gscan2pdf package pulled in sundry and various useful things such as tesseract and djvu2pdf.

On the Mac I use the viewer djview-libre which is also available for Linux and Windows.

@@ Line 1: / Line 1: @@
 I have a Brother scanner that has a document feed on it.
 It scans a multipage doc and puts the output into a PDF file on my server via FTP.
+I am trying out the djvu format, it seems like a good way to manage the scanned pages. Compression is very good.
 # Scan the odd pages, front to back, resulting in a single PDF file.
 # Scan the even pages, back to front, resulting in a second PDF file.
-# Convert the 2 PDF documents to 2 PS documents. ''pdftops infile.pdf outfile.ps''
+# Convert the 2 PDF documents to DJVU documents, 1 per page. ''mkdir f && pdf2djvu -i f frontpages.pdf''
-# Split the PS documents into separate files, one page per file
+# Optionally perform any additional processing on the individual pages, such as image filtering or contrast enhancement.
-# Optionally perform any additional processing on the individual pages, such as image compression
+# Perform OCR on the individual page files so they can be searched separately
-# For WEB version
+# Merge the page files into one DJVU bundle, making sure they get into the right order. (QA!)
-## Perform OCR on the individual page files so they can be searched separately
-## Convert individual pages into PNG files for viewing
-## Put all pages into a book viewer collection
-# Merge the page files into one PS document
-# Convert the merged document back into PDF document
-# Perform OCR on the PDF doc
-Notes:
-Commands with '2' like 'pdf2ps' are from the psutils package.
-Commands with 'to' like 'pdftops' are from the poppler-utils package.
-I am not sure if tehre are any advantages to use one or the other when there are  equivalent commands (for example 'pdf2ps' versus 'pdftops')
-== Supporting scripts ==
-=== paginate.pl ===
-I split the documents into separate pages with this perl script.
-<pre>
-#!/usr/bin/perl
-#
-# Separate a postscript file called file.ps into separate pages named file.N.ps
-# -r will reverse output page order
-#
-$pages = 0;
-$rev = 0;
-$fname = shift;
-if ($fname eq '-r') {
-    $rev = 1;
-    $fname = shift;
-}
-print "Processing file $fname\n";
-$fname =~ /(.*)\.ps$/;
-$base = $1 || die;
-# Find number of pages in doc
-open(IN,$fname)||die;
-while (<IN>){
-    if (/^%%Pages: (.*)/) {
-	$pages = $1;
-	break;
-    }
-}
-close IN;
-$p = 0;
+== Notes ==
-$r = $pages;
-while ($pages--){
-    $p++;
-    # This handles docs up to 999 pages
+Packages under Ubuntu are poppler-utils, psutils. Installing the gscan2pdf package pulled in sundry and various useful things such as tesseract and djvu2pdf.
-    $p0 = ($rev)? $r : $p;
-    if ($p0 < 10) {
-	$p0 = '00' . $p0;
-    } elsif ($p0 < 100) {
-	$p0 = '0' . $p0;
-    }
-    $cmd = "psselect -p $p $fname $base.$p0.ps";
-    print $cmd;
-    system($cmd);
-    print "\n";
-    $r--;
+On the Mac I use the viewer djview-libre which is also available for Linux and Windows.
-}
-</pre>

How to scan documents: Difference between revisions

Revision as of 03:02, 8 November 2009

Notes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools