What happened to the Messenger Plus! forums on msghelp.net?
Shoutbox » MsgHelp Archive » Skype & Technology » Tech Talk » File format suggestions

File format suggestions
Author: Message:
CookieRevised
Elite Member
*****

Avatar

Posts: 15517
Reputation: 173
– / Male / Flag
Joined: Jul 2003
Status: Away
RE: File format suggestions
With the apprauch your taking (chuncks) you can include anything you want, so that seems ok...

Though, "File Author" and "Comments", can be a chunck also, a chunck which is "required". Call it the "FileInfo"-chunck or something :D Of course, you can decide not to include it in a special chunck and leave it in the main format part. But don't forget that the comments-field needs a length identifier. Otherwise you'll waste valuable space for short comments:
[actual comment] = "my new file format" + 237-unused bytes = 255 bytes!
compared to:
[length-identifier][actual comment] = &H12 + "my new file format" = 19 bytes

Chuncks need a small structure also. Otherwise you couldn't identify what kind of chunk you have. And you can add checksums or something also if you like...

Also the "File Type" should have a version number if you want to make it perfect; In the futur you could improve a certain application and decide to use a better compression or something (I dunno). The "File Type" will be the same, but the file's data will be different, so add a version number to it and that's solved also (example: GIF87a GIF89a; both are GIF, but both have different filetructures)

So:

  • File Type (fixed-length 5bytes long string) - It's a kind of ID. Since the different applications use this same format, there must be a way to recognize what file is yours and what file isn't. It's length is fixed. Eg: I have an application, GIF editor, my iD will be "GIFED" of "EDITG", or anything similar.
  • File Type Version (1 byte) - the bits identify up to 255 different versions.
    Comments: This can be used to identify what kind of chuncks the file uses. For example, in futur version you could want to enhance your chunck-layout. The "File Type" would be the same, but the actual chunck-layout is different, so this "version"-identifier can be checked to know if the "file type" is the old version or the new enhanced version...
  • File Chunks : Unlimited packs of data which each have 7 parts:
    • Name (limited to 255 bytes) - Chunk name, should be a unique identifier, not forced tho.
      Comments: Make the Name shorter. 255 bytes are too much of a waste of space. Make it like the "File Type", so 5 bytes. That's more then enough to make all kind of different chuncks... Also, make it forced, I mean it should be required IMO (to be consistent with the overall global file format you're creating).
    • Version (1 byte) - Same as "File Version". Adds the same advantages but then for individual chunk-types...
      Comments: For example, futur types of the chunck called "graph" can be compressed. So then you have a uncompressed "graph" version 1, and a compressed "graph" version 2 or something like that. Also, concearning the chunck-checksum (see below): You could reserve bit 8 in this version-field to imply that there is a checksum or not. In this way the checksum could be optional.
    • Comments Length (1 byte) - To identify the length of the comment, otherwise you wont know where to comment begins or ends. Unless you will always use 255 bytes. But for short comments this is a waste of space.
    • Comments (limited to 255 bytes) - Like the file comments, additional information for this chunk.
    • Data Length (unsigned double word, =4 bytes) - The length of the actual chunk-data. 4 bytes means a maximum length of 4GB!
      Comments: This is needed as you have no other means of telling where the chunck will start and where it will end. Also, you can use less bytes to tell the length, but this will of course mean that your actual chunck-datalength will be not as high. But reading a double word is much easier then reading 3 bytes for example. Though, you can choose to limit it to 2 bytes (max chunck data length is 65535 bytes then (64Kb)). But that would limit you in what you can have as chunckdata. So 4 bytes, unsigned double word, is the best choice (and, btw, also used in almost all existing filetypes)...
    • Data (limited to 16777215 bytes, or &HFFFFFF, which is approx. 16.7 kb) - The actual content of the chunk, can be a configuration file, anything, really.
      Comments: 16777215 bytes = +-16Mb. But anyway, since I would use 4 bytes for the chuncks "data length"-identifier, this would be +- 4Gb...
    • Checksum (x bytes; depend on what kind of checksum you use) - You could add a checksum to the chunck to make it possible to verify the integrity of your data.
      Comments: But that would imply reading/saving/checking the data, which could mean slow-processing. On the other hand, you can create your own type of checksum (only take the hash of byte 10 thru byte 100 or something). This has some advantages: since it only checks some bytes and not all, the speed wouldn't be as slow as if you would check the whole chunck-data. And people who wanna "hack" your fileformat will have a hard time doing it, because they don't know how the checksum is calculated.

A special (and required?) chunck would be the "FileInfo"-chunck wich would hold:
  • Name: FINFO
  • Version: &H01
  • Comments Length: &H11
  • Comments: "General File Info"
  • Data Length: xxxx
  • Data: consists of some "mini-chuncks":
    [Type (1 byte)] [Length (1 byte)] [actual string (max 255 bytes)]
    Type:
    &H01 = defines the "Author"-string
    &H02 = defines the "Application"-string
    &H03 = defines the ...
    etc...
    Length:
    Length of the actual string.
  • Checksum: xxxx

Maybe you noticed that this file format approach is very similar to PNG. Well, there is a reason for it. The format has a very large potential and you can do whatever you want with it.

Some will say: "Cookie, you make it again more difficult then it is". Well in fact, again, it isn't. It seems difficult, but it realy isn't. This approach makes it that you can do whatever you want with the format and you can store whatever you want with it. And, most important, you are "save" for any futur developments you want to make without reinventing/recreating a new fileformat. This means, your old applications would even read files from your new applications without errors. (If they can interpret the data is something else, that depends on how "compatible" you make your new applications).

In fact, this general format I just discribed is used by many many existing companies because of it's versitile use. (And I use just the same for some of my applications)

This post was edited on 06-12-2004 at 12:04 AM by CookieRevised.
.-= A 'frrrrrrrituurrr' for Wacky =-.
06-11-2004 11:51 PM
Profile PM Find Quote Report
« Next Oldest Return to Top Next Newest »

Messages In This Thread
File format suggestions - by Millenium_edition on 06-11-2004 at 02:24 PM
RE: File format suggestions - by Concord Dawn on 06-11-2004 at 02:50 PM
RE: File format suggestions - by Millenium_edition on 06-11-2004 at 03:02 PM
RE: File format suggestions - by CookieRevised on 06-11-2004 at 11:51 PM
RE: File format suggestions - by Choli on 06-12-2004 at 12:22 AM
RE: File format suggestions - by CookieRevised on 06-12-2004 at 01:45 AM
RE: File format suggestions - by Choli on 06-12-2004 at 10:09 AM
RE: File format suggestions - by Millenium_edition on 06-12-2004 at 12:08 PM
RE: File format suggestions - by Choli on 06-12-2004 at 01:30 PM


Threaded Mode | Linear Mode
View a Printable Version
Send this Thread to a Friend
Subscribe | Add to Favorites
Rate This Thread:

Forum Jump:

Forum Rules:
You cannot post new threads
You cannot post replies
You cannot post attachments
You can edit your posts
HTML is Off
myCode is On
Smilies are On
[img] Code is On