[Fwd: I-D ACTION:draft-hall-mime-app-mbox-03.txt]

The ADs for the applications area are going to move this into last-call again, but we all want to run it back up the flagpole one more time. This version departs from earlier releases by specifying a "format" parameter to the application/mbox media-type, by defining a "default" value for the parameter, and by defining a default mbox database format that must be supported. The use of a format parameter allows different implementations to be documented, and for them to be specified in the transfer (eg, a "SunOS" format might specify the use of content-length and other things). Unrecognized formats are to be treated as application/octet-stream. The "default" format uses a sequence of 822 messages, with the exception that line-endings are LF instead of CR/LF (this only applies to the canonical database, and doesn't affect the transfer protocol or anything else). Inheriting 822 rules means that email addresses must be qualified, encodings must be specified, etc. All implementations have to support this default format, and unspecified formats must be treated as "default" (this is mostly for the benefit of protocols like HTTP, where parameters are not always (or even usually) defined. Another thing that is specified here is that separator lines (at the least) must be encoded to prevent local collisions, when an mbox attachment is saved into an existing local folder (messages can become irreversible mingled if some kind of escaping is not performed). There's one problem in this that I caught after submission, which is that ">From" escaping is specified, but shouldn't be. ">From" escaping is not needed for transfers, and it's mostly a local matter anyway (having to do with local parsing), and it's damn near impossible to deal with all the potential exceptions (such as doubled escapes against quoted text, for example), so it's really best not to specify anything. I'll roll this change into any other comments that I get for the -04 version. Anyway, any other comments are appreciated, as usual. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/

On Wed January 26 2005 14:57, Eric A. Hall wrote:
The ADs for the applications area are going to move this into last-call again, but we all want to run it back up the flagpole one more time.
This version departs from earlier releases by specifying a "format" parameter to the application/mbox media-type, by defining a "default" value for the parameter,
Several things are missing related to that: 1. A registration procedure for registering new format value keywords (that could be, and probably should be, a separate document). 2. An IANA Considerations section related to establishment of a format value keyword registry (containing the "default" entry), and maintenance of that registry in conjunction with the registration procedure. 3. Location of the format value keyword registry (so that implementors can find the registry). That should be coordinated with IANA. 4. Syntax rules and ABNF for the format keyword values, unless "anything goes". 5. Semantic rules for format value keywords, e.g. are they case-insensitive. 6. Provision, if any, for private-use or experimental format value keywords (e.g. reservation of keywords beginning with"x-" for such purposes). [...]
The "default" format uses a sequence of 822 messages, with the exception that line-endings are LF instead of CR/LF (this only applies to the canonical database, and doesn't affect the transfer protocol or anything else). [...] Another thing that is specified here is that separator lines (at the least) must be encoded to prevent local collisions, when an mbox attachment is saved into an existing local folder (messages can become irreversible mingled if some kind of escaping is not performed).
Since the format differs from canonical message format, and as there appears to be provision for encoding parts of the media type (using an unspecified encoding algorithm), it appears that several items are missing regarding such encoding: 1. encoding algorithm(s) and corresponding decoding algorithm(s) 2. how the particular encoding algorithm used by the originator is specified with the media type so that it can be reversed by the recipient. 3. interaction between any transfer encoding (RFC 2045) which may be present in messages and the encoding algorithms above 4. if it is possible to have the entire media type encoded or only portions ("at the least") encoded, how the recipient can determine which is the case, and how to identify which portions are encoded so that appropriate decoding -- of those portions only -- can be performed w/o mangling unencoded portions, even if those unencoded portions contain content which has octet sequences resembling encoded portions. [I suspect that partial encoding won't work, and that the entire media type would have to be encoded/decoded as a unit.] 5. Interaction of encoding mechanisms and modifications that may occur during transport (message/partial fragmentation, addition of spurious whitespace, removal of trailing whitespace, etc.). 6. Since the media type format contains lone LF octets, it is unsuitable for transfer w/o transfer encoding (RFC 2822 section 2.3); it is therefore possible that: a) a message within an mbox may have had RFC 2045 transfer encoding applied to a body MIME-part, with a corresponding Content-Transfer-Encoding field b) CRLF sequences delimiting lines may have been changed to LF c) some encoding may be applied to all or portions of the media type for the purpose of escaping "separator lines" d) transfer encoding may have to be applied to the media type for transfer, as it would otherwise contain non-conforming octet sequences (LF not immediately preceded by CR (RFC 2822 sect. 2.3) In order to recover content end-to-end, it is necessary to specify the order of the various transformations and the corresponding decoding sequence, to prevent undesirable interaction between encoding/decoding operations that would alter message content. In the registration temple, the magic number could be indicated as 0x46726F6D20 ("From ").

On 1/30/2005 11:24 AM, Bruce Lilly wrote:
2. An IANA Considerations section related to establishment of a format value keyword registry (containing the "default" entry), and maintenance of that registry in conjunction with the registration procedure.
Yeah probably.
4. Syntax rules and ABNF for the format keyword values, unless "anything goes".
5. Semantic rules for format value keywords, e.g. are they case-insensitive.
That's already covered in section 5.1 of RFC 2045.
Another thing that is specified here is that separator lines (at the least) must be encoded to prevent local collisions, when an mbox attachment is saved into an existing local folder (messages can become irreversible mingled if some kind of escaping is not performed).
Since the format differs from canonical message format, and as there appears to be provision for encoding parts of the media type (using an unspecified encoding algorithm), it appears that several items are missing regarding such encoding:
1. encoding algorithm(s) and corresponding decoding algorithm(s)
this is just transfer encoding, although I guess this point needs clarification
In the registration temple, the magic number could be indicated as 0x46726F6D20 ("From ").
good idea -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
participants (2)
-
Bruce Lilly
-
Eric A. Hall