
On Wed January 26 2005 14:57, Eric A. Hall wrote:
The ADs for the applications area are going to move this into last-call again, but we all want to run it back up the flagpole one more time.
This version departs from earlier releases by specifying a "format" parameter to the application/mbox media-type, by defining a "default" value for the parameter,
Several things are missing related to that: 1. A registration procedure for registering new format value keywords (that could be, and probably should be, a separate document). 2. An IANA Considerations section related to establishment of a format value keyword registry (containing the "default" entry), and maintenance of that registry in conjunction with the registration procedure. 3. Location of the format value keyword registry (so that implementors can find the registry). That should be coordinated with IANA. 4. Syntax rules and ABNF for the format keyword values, unless "anything goes". 5. Semantic rules for format value keywords, e.g. are they case-insensitive. 6. Provision, if any, for private-use or experimental format value keywords (e.g. reservation of keywords beginning with"x-" for such purposes). [...]
The "default" format uses a sequence of 822 messages, with the exception that line-endings are LF instead of CR/LF (this only applies to the canonical database, and doesn't affect the transfer protocol or anything else). [...] Another thing that is specified here is that separator lines (at the least) must be encoded to prevent local collisions, when an mbox attachment is saved into an existing local folder (messages can become irreversible mingled if some kind of escaping is not performed).
Since the format differs from canonical message format, and as there appears to be provision for encoding parts of the media type (using an unspecified encoding algorithm), it appears that several items are missing regarding such encoding: 1. encoding algorithm(s) and corresponding decoding algorithm(s) 2. how the particular encoding algorithm used by the originator is specified with the media type so that it can be reversed by the recipient. 3. interaction between any transfer encoding (RFC 2045) which may be present in messages and the encoding algorithms above 4. if it is possible to have the entire media type encoded or only portions ("at the least") encoded, how the recipient can determine which is the case, and how to identify which portions are encoded so that appropriate decoding -- of those portions only -- can be performed w/o mangling unencoded portions, even if those unencoded portions contain content which has octet sequences resembling encoded portions. [I suspect that partial encoding won't work, and that the entire media type would have to be encoded/decoded as a unit.] 5. Interaction of encoding mechanisms and modifications that may occur during transport (message/partial fragmentation, addition of spurious whitespace, removal of trailing whitespace, etc.). 6. Since the media type format contains lone LF octets, it is unsuitable for transfer w/o transfer encoding (RFC 2822 section 2.3); it is therefore possible that: a) a message within an mbox may have had RFC 2045 transfer encoding applied to a body MIME-part, with a corresponding Content-Transfer-Encoding field b) CRLF sequences delimiting lines may have been changed to LF c) some encoding may be applied to all or portions of the media type for the purpose of escaping "separator lines" d) transfer encoding may have to be applied to the media type for transfer, as it would otherwise contain non-conforming octet sequences (LF not immediately preceded by CR (RFC 2822 sect. 2.3) In order to recover content end-to-end, it is necessary to specify the order of the various transformations and the corresponding decoding sequence, to prevent undesirable interaction between encoding/decoding operations that would alter message content. In the registration temple, the magic number could be indicated as 0x46726F6D20 ("From ").