Document File Formats: Difference between revisions

From Wildsong
Jump to navigationJump to search
Brian Wilson (talk | contribs)
Brian Wilson (talk | contribs)
 
(2 intermediate revisions by the same user not shown)
Line 5: Line 5:
TIFF: this is a scanner output format and can be compressed
TIFF: this is a scanner output format and can be compressed


Some people are promoting the use of PDF's for document stoage.
Many people use PDF for document storage.
There is the Adobe XML - based PDF format. See [http://labs.adobe.com/downloads/mars.html Adobe Mars project]
It seems to be going nowhere.


What are the trade offs?
There is the Microsoft XPS format, which is, well, a Microsoft standard and I want to be able to open my documents for many years. Smirk.
 
I wonder what ODF can do?


# PDF is a doc format, TIFF is more of an image format. Programs to view PDF's are a little more user-friendly and widely available.
# PDF is a doc format, TIFF is more of an image format. Programs to view PDF's are a little more user-friendly and widely available.
# Both standards are pretty much open and universal.
# Both standards are pretty much open and universal.
# Sizes?
# Sizes?
# Can I store both image and text data in a PDF? ''Dang, where's that PDF of the Acrobat PDF Bible??'' Where are the specs for PDF format?
# PDF's can be encrypted. (It's part of the spec.)
# PDF's can be encrypted. (It's part of the spec.)


Line 21: Line 24:
This is mostly dictated by the format of the source file, but I am inclined to think I should settle on a few standards and transcode everything into those formats.
This is mostly dictated by the format of the source file, but I am inclined to think I should settle on a few standards and transcode everything into those formats.


'''audio''': mp3 (yes I know it's a copyrighted format but it's ubiquitous)
'''audio''': '''mp3''' (yes I know it's a copyrighted format but it's ubiquitous)
This will include '''voicemail''' if I ever go over to a whizzy Asterisk system here at home.
This will include '''voicemail''' if I ever go over to an Asterisk PBX here at home.
 
'''photo''': '''jpeg''' or '''tiff''' - General rule: do not transcode TIFF to JPEG, which is lossy.


'''photo''': jpeg or tiff - General rule: do not transcode TIFF to JPEG, which is lossy.
'''other image files''': '''png''' - Generally I like png because it allows transparency. '''gif''' is for when I want a flying pelican or spinning gears on my loading page!


'''movie''': I have so few movies right now that this is not relevant yet.
'''movie''': I have so few movies right now that this is not relevant yet.
Line 30: Line 35:
'''text files''':  
'''text files''':  
I don't want to store formatted text files for long term access in MS-Word format!
I don't want to store formatted text files for long term access in MS-Word format!
What format does OO use?
What format does OO use? 2023 update-- some kind of XML. Fortunately both OO and MS are now cleanly handling each other's formats.


Plain text files should stay that way.
Plain text files should stay that way.


'''email''': I think email should be stored into a MySQL database when it comes in and purged automatically after about a year unless I tag messages for archiving. This goes for both sent and received email. I might want to automatically tag/archive mail with certain addresses.
'''email''': I think email should be stored into a MySQL database when it comes in and purged automatically after about a year unless I tag messages for archiving. This goes for both sent and received email. I might want to automatically tag/archive mail with certain addresses.

Latest revision as of 18:40, 21 September 2023

What is the best format to keep a given document in?

Source of document: paper

TIFF: this is a scanner output format and can be compressed

Many people use PDF for document storage. There is the Adobe XML - based PDF format. See Adobe Mars project It seems to be going nowhere.

There is the Microsoft XPS format, which is, well, a Microsoft standard and I want to be able to open my documents for many years. Smirk.

I wonder what ODF can do?

  1. PDF is a doc format, TIFF is more of an image format. Programs to view PDF's are a little more user-friendly and widely available.
  2. Both standards are pretty much open and universal.
  3. Sizes?
  4. PDF's can be encrypted. (It's part of the spec.)

Wikipedia entry on PDF

Source of document: digital

This is mostly dictated by the format of the source file, but I am inclined to think I should settle on a few standards and transcode everything into those formats.

audio: mp3 (yes I know it's a copyrighted format but it's ubiquitous) This will include voicemail if I ever go over to an Asterisk PBX here at home.

photo: jpeg or tiff - General rule: do not transcode TIFF to JPEG, which is lossy.

other image files: png - Generally I like png because it allows transparency. gif is for when I want a flying pelican or spinning gears on my loading page!

movie: I have so few movies right now that this is not relevant yet.

text files: I don't want to store formatted text files for long term access in MS-Word format! What format does OO use? 2023 update-- some kind of XML. Fortunately both OO and MS are now cleanly handling each other's formats.

Plain text files should stay that way.

email: I think email should be stored into a MySQL database when it comes in and purged automatically after about a year unless I tag messages for archiving. This goes for both sent and received email. I might want to automatically tag/archive mail with certain addresses.