How to scan documents: Difference between revisions

From Wildsong
Jump to navigationJump to search
Brian Wilson (talk | contribs)
m New page: I have a Brother scanner that has a document feed on it. It scans a multipage doc and puts the output into a PDF file on my server via FTP. # Scan the odd pages, front to back. # Scan the...
 
Brian Wilson (talk | contribs)
mNo edit summary
Line 2: Line 2:
It scans a multipage doc and puts the output into a PDF file on my server via FTP.
It scans a multipage doc and puts the output into a PDF file on my server via FTP.


# Scan the odd pages, front to back.
# Scan the odd pages, front to back, resulting in a single PDF file.
# Scan the even pages, back to fron.
# Scan the even pages, back to front, resulting in a second PDF file.
# Convert the PDF documents to PS documents
# Convert the 2 PDF documents to 2 PS documents. ''pdftops infile.pdf outfile.ps''
# Split the PS documents into separate files, one page per file
# Split the PS documents into separate files, one page per file
# Optionally perform any additional processing on the individual pages, such as image compression
# Optionally perform any additional processing on the individual pages, such as image compression
Line 14: Line 14:
# Convert the merged document back into PDF document
# Convert the merged document back into PDF document
# Perform OCR on the PDF doc
# Perform OCR on the PDF doc
Notes:
Commands with '2' like 'pdf2ps' are from the ghostscript package.
Commands with 'to' like 'pdftops' are from the poppler-utils package.
I am not sure if tehre are any advantages to use one or the other when there are  equivalent commands (for example 'pdf2ps' versus 'pdftops')

Revision as of 19:43, 7 November 2009

I have a Brother scanner that has a document feed on it. It scans a multipage doc and puts the output into a PDF file on my server via FTP.

  1. Scan the odd pages, front to back, resulting in a single PDF file.
  2. Scan the even pages, back to front, resulting in a second PDF file.
  3. Convert the 2 PDF documents to 2 PS documents. pdftops infile.pdf outfile.ps
  4. Split the PS documents into separate files, one page per file
  5. Optionally perform any additional processing on the individual pages, such as image compression
  6. For WEB version
    1. Perform OCR on the individual page files so they can be searched separately
    2. Convert individual pages into PNG files for viewing
    3. Put all pages into a book viewer collection
  7. Merge the page files into one PS document
  8. Convert the merged document back into PDF document
  9. Perform OCR on the PDF doc

Notes: Commands with '2' like 'pdf2ps' are from the ghostscript package. Commands with 'to' like 'pdftops' are from the poppler-utils package. I am not sure if tehre are any advantages to use one or the other when there are equivalent commands (for example 'pdf2ps' versus 'pdftops')