SVG12: charset parameter for image/svg+xml

Dear Scalable Vector Graphics Working Group, http://www.w3.org/TR/2004/WD-SVG12-20041027/mimereg.html attempts to register the "image/svg+xml" MIME Type but the registration lacks the charset parameter as defined in RFC 3023. This defeats the purpose of the +xml convention which attempts to provide a means for generic XML processing. Generic XML tools such as Validators, Editors, XSLT and XQuery processors, and full-text XML search engines would need to maintain special knowledge to ignore the charset parameter for image/svg+xml documents which is expensive and unlikely to happen. In fact, a number of deployed tools already don't do that, for example the W3C Markup Validator would need to be updated with special code for image/svg+xml in order to comply with the registration. Thus, please change the registration to be consistent with application/ xml as defined in RFC 3023. regards.

"Bjoern Hoehrmann" <derhoermi@gmx.net> wrote in message news:41b49119.563677485@smtp.bjoern.hoehrmann.de...
http://www.w3.org/TR/2004/WD-SVG12-20041027/mimereg.html attempts to register the "image/svg+xml" MIME Type but the registration lacks the charset parameter as defined in RFC 3023. This defeats the purpose of the +xml convention which attempts to provide a means for generic XML processing.
Thus, please change the registration to be consistent with application/ xml as defined in RFC 3023.
I fully agree with this, I would also like the Working Group to consider registering application/svg+xml with a distinction between image and application being scripting and sXBL within the document. The reasons, I've raised before, but if you want me to elaborate further, I can do. Cheers. Jim.

Jim Ley wrote: <snip> | | > Thus, please change the registration to be consistent with | > application/ xml as defined in RFC 3023. | | I fully agree with this, I would also like the Working Group | to consider registering application/svg+xml with a | distinction between image and application being scripting and | sXBL within the document. The reasons, I've raised before, | but if you want me to elaborate further, I can do. I agree with Jim here, with one caveat. The distinction between applications and images is not quite so clear cut. An author can create a declarative (SMIL) application, such as a game or interactive documents, that requires no script. This will be especially true with such features as editable text (combined with the snaphot save of ASV). Similarly, you could have sXBL content that does not required scripting, but will merely render a static or declarative version of some other XML (say, an org chart). Is this an application? I don't think so; it's merely a graphical representation of XML data. Pragmatically speaking, however, I think Jim is correct. For the present, any SVG document with a 'script' element or reference, be it part of sXBL or not, should be declared an application; however, what constitutes the distinction should remain an open issue. I think that the core issue is the degree of threat posed to the user, and that this should be the real shibboleth. Regards- -Doug

On Monday, November 1, 2004, 2:49:32 AM, Bjoern wrote: BH> Dear Scalable Vector Graphics Working Group, BH> http://www.w3.org/TR/2004/WD-SVG12-20041027/mimereg.html attempts to BH> register the "image/svg+xml" MIME Type but the registration lacks the BH> charset parameter as defined in RFC 3023. This defeats the purpose of BH> the +xml convention which attempts to provide a means for generic XML BH> processing. On the contrary! The +xml convention clearly indicates, for an unknown media type, that it is xml; thus, that an XML processor should be used; which will correctly determine the encoding from the xml encoding declaration or lack therof. I was able to discuss this with Murata-san in Tokyo at SVG Open, and he agreed that the +xml convention, plus the deprecation of text/xml and associated charset handling weirdness of required us-ascii fallback, allows consistent handling BH> Generic XML tools such as Validators, Editors, XSLT and XQuery BH> processors, and full-text XML search engines would need to maintain BH> special knowledge to ignore the charset parameter for image/svg+xml BH> documents No, they would not. RFC 3023 already allows the charset to be omitted, and gives rules to follow for this case. SVG follows those rules, as the registration document makes plain. However, if such a parameter were to be added, anything that downloaded an SVG document (or any other type of document that used a charset parameter) would have to know to change the encoding declaration in the xml - otherwise it would be on well formed when read from local disk. this is expensive and unlikely to happen. BH> which is expensive and unlikely to happen. In fact, a number BH> of deployed tools already don't do that, for example the W3C Markup BH> Validator would need to be updated with special code for image/svg+xml BH> in order to comply with the registration. Incorrect; see above. BH> Thus, please change the registration to be consistent with application/ BH> xml as defined in RFC 3023. In fact it is consistent already, but refers to specific cases while excluding other cases. Speaking as one of the co-editors of the revision to RFC 3023, which removes this requirement and deprecates text/xml, thus bringing it into line with the good practice notes in the AWWW document: http://www.w3.org/TR/2004/WD-webarch-20040816/Overview.html#xml-media-types Good practice: XML and "text/*" In general, a representation provider SHOULD NOT assign Internet media types beginning with "text/" to XML representations Good practice: XML and character encodings In general, a representation provider SHOULD NOT specify the character encoding for XML data in protocol headers since the data is self-describing I can confirm that the idea is to apply this more generally. In other words, the omission of the charset parameter was not an oversight; it was a deliberate design choice following discussions in the TAG over the last year and a half. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group

* Chris Lilley wrote:
BH> which is expensive and unlikely to happen. In fact, a number BH> of deployed tools already don't do that, for example the W3C Markup BH> Validator would need to be updated with special code for image/svg+xml BH> in order to comply with the registration.
Incorrect; see above.
The W3C Markup Validator considers resources such as Content-Type: image/svg+xml;charset=iso-8859-1 <?xml version='1.0' encoding='utf-8'?> ... ISO-8859-1 encoded. If it is incorrect that the W3C Markup Validator needs to change you either want that all processors treat the resource ISO-8859-1 encoded in which case there is not really a point in not allowing an optional charset parameter, or you want two classes of non- interoperable processors, where some consider it ISO-8859-1 and others consider it UTF-8. Which one is it?

On Monday, November 1, 2004, 9:14:02 PM, Bjoern wrote: BH> * Chris Lilley wrote:
BH> which is expensive and unlikely to happen. In fact, a number BH> of deployed tools already don't do that, for example the W3C Markup BH> Validator would need to be updated with special code for image/svg+xml BH> in order to comply with the registration.
Incorrect; see above.
BH> The W3C Markup Validator considers resources such as BH> Content-Type: image/svg+xml;charset=iso-8859-1 BH> <?xml version='1.0' encoding='utf-8'?> BH> ... BH> ISO-8859-1 encoded. Yes, there are two inconsistent pieces of metadata and the markup validator correctly applies the rules to determine which to use. Note that the example above, if saved to disk and re-opened, is not well formed. This is undesirable. Lots of content is processed from local disk, on servers and on clients. The markup validator does the correct thing also in this case Content-Type: image/svg+xml <?xml version='1.0' encoding='utf-8'?> ... and this one Content-Type: image/svg+xml <?xml version='1.0'?> ... and this one Content-Type: image/svg+xml <?xml version='1.0' encoding="koi-8-r"?> ... as per RFC 3023. BH> If it is incorrect that the W3C Markup Validator needs to change you BH> either want that all processors treat the resource ISO-8859-1 BH> encoded in which case there is not really a point in not allowing an BH> optional charset parameter, I agree, there is no point in an optional charset parameter. Thats why the registration for svg doesn't have one. BH> or you want two classes of non- interoperable processors, where some BH> consider it ISO-8859-1 and others consider it UTF-8. Which one is BH> it? That is the current situation, yes. Some processors do one and some do the other. As you note, its undesirable. As you note, I don't want this. The TAG would like to see mutually inconsistent duplicate metadata reduced or eliminated. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group

* Chris Lilley wrote:
BH> The W3C Markup Validator considers resources such as
BH> Content-Type: image/svg+xml;charset=iso-8859-1
BH> <?xml version='1.0' encoding='utf-8'?> BH> ...
BH> ISO-8859-1 encoded.
Yes, there are two inconsistent pieces of metadata and the markup validator correctly applies the rules to determine which to use.
What makes you think there is inconsistent metadata here? The charset parameter in the example above has no semantics that could be inconsistent with other information.

On Monday, November 1, 2004, 10:19:01 PM, Bjoern wrote: BH> * Chris Lilley wrote:
BH> The W3C Markup Validator considers resources such as
BH> Content-Type: image/svg+xml;charset=iso-8859-1
BH> <?xml version='1.0' encoding='utf-8'?> BH> ...
BH> ISO-8859-1 encoded.
Yes, there are two inconsistent pieces of metadata and the markup validator correctly applies the rules to determine which to use.
BH> What makes you think there is inconsistent metadata here? The encoding is declared in two places, and they are different. One is the charset parameter, and one is the xml encoding declaration. BH> The charset parameter in the example above has no semantics that BH> could be inconsistent with other information. Its clearly inconsistent! The encoding declaration has the same semantic. It agree that there is a mechanism to *resolve* the ambiguity, but the inconsistency is clearly there. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group

* Chris Lilley wrote:
BH> The W3C Markup Validator considers resources such as
BH> Content-Type: image/svg+xml;charset=iso-8859-1
BH> <?xml version='1.0' encoding='utf-8'?> BH> ...
BH> ISO-8859-1 encoded.
Yes, there are two inconsistent pieces of metadata and the markup validator correctly applies the rules to determine which to use.
BH> What makes you think there is inconsistent metadata here?
The encoding is declared in two places, and they are different. One is the charset parameter, and one is the xml encoding declaration.
Why does the charset parameter in the example above declare an encoding?
From http://www.w3.org/TR/2004/WD-SVG12-20041027/mimereg.html it should be clear that the is no charset parameter and it thus cannot have such semantics, so there is no inconsistency here.

On Monday, November 1, 2004, 10:41:36 PM, Bjoern wrote: BH> Why does the charset parameter in the example above declare an encoding? BH> From http://www.w3.org/TR/2004/WD-SVG12-20041027/mimereg.html it should BH> be clear that the is no charset parameter and it thus cannot have such BH> semantics, so there is no inconsistency here. You had not stated your example very carefully. I thought we were still talking about what the validator implements. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group

* Chris Lilley wrote:
BH> Why does the charset parameter in the example above declare an encoding? BH> From http://www.w3.org/TR/2004/WD-SVG12-20041027/mimereg.html it should BH> be clear that the is no charset parameter and it thus cannot have such BH> semantics, so there is no inconsistency here.
You had not stated your example very carefully. I thought we were still talking about what the validator implements.
We are, you said the W3C Markup Validator does not need to be changed in order to comply with the proposed image/svg+xml registration which however states that [...] The encoding of an SVG document is determined by the XML encoding declaration. This has identical semantics to the application/xml media type in the case where the charset parameter is omitted, as specified in RFC3023 sections 8.9, 8.10 and 8.11. [...] which is inconsistent with the behavior of the W3C Markup Validator.
From the proposed registration it seems very clear that the behavior of the W3C Markup Validator is non-conforming, you say it is never- theless conforming which makes no sense to me.

Chris Lilley wrote:
On the contrary! The +xml convention clearly indicates, for an unknown media type, that it is xml; thus, that an XML processor should be used; which will correctly determine the encoding from the xml encoding declaration or lack therof.
I think the concern was about what happens when someone sends the following HTTP header: Content-Type: image/svg+xml; charset=iso-8859-1 combined with an XML document that has no encoding declaration (so defaulting to UTF-8). Now per the type registration for "image/svg+xml", the above Content-Type header is invalid, right? So what's a UA to do? What encoding to use? Using UTF-8 means hardcoding knowledge about the fact that image/svg+xml, unlike most other character-based types used today, doesn't have a charset parameter.
No, they would not. RFC 3023 already allows the charset to be omitted, and gives rules to follow for this case. SVG follows those rules, as the registration document makes plain.
The problems arise when there IS a charset parameter. I don't think anyone ever claimed there is a problem when the charset parameter is omitted.
In general, a representation provider SHOULD NOT specify the character encoding for XML data in protocol headers since the data is self-describing
Given that this is a not a MUST NOT, people will continue to do this in some cases (particularly as some web servers automatically tack on a "charset" parameter to Content-Type headers). -Boris

On Mon, 1 Nov 2004 20:03:17 +0100 Chris Lilley <chris@w3.org> wrote:
I was able to discuss this with Murata-san in Tokyo at SVG Open, and he agreed that the +xml convention, plus the deprecation of text/xml and associated charset handling weirdness of required us-ascii fallback, allows consistent handling
No, I did not agree to drop the charset parameter. I did agree to deprecate text/xml. I do believe that image/svg+xml should allow the charset parameter. Cheers, -- MURATA Makoto <murata@hokkaido.email.ne.jp>

On Tuesday, November 2, 2004, 4:45:40 PM, MURATA wrote: MM> On Mon, 1 Nov 2004 20:03:17 +0100 MM> Chris Lilley <chris@w3.org> wrote:
I was able to discuss this with Murata-san in Tokyo at SVG Open, and he agreed that the +xml convention, plus the deprecation of text/xml and associated charset handling weirdness of required us-ascii fallback, allows consistent handling
MM> No, I did not agree to drop the charset parameter. What I said was that you agreed that there was consistent handling in the case where there was no charset parameter. Do you recall discussing that over lunch? Jon was there, too. We also discussed consistent handling of encoding among all media types; you had earlier stated that this was particularly needed for simple transcoders that did not know enough about xml not to break it. At that meeting, I suggested (and you agreed) that the presence of the +xml convention allowed such transcoders to reliably detect all xml media types whether the precise media type was recognized or not. I further suggested (and you seemed to agree) that two types of processing were desirable: a) (minimally) xml-aware transcoders - transcode, and update the xml encoding declaration so that it is consistent b) xml-unaware transcoders - not break xml well formednes, not transcode xml and a third type of processing was undesirable c) xml unaware transcoders that take well formed documents and make them malformed. Transcoders already do something similar, for example they do not attempt to transcode the bytes of JPEG images. MM> I did agree to deprecate text/xml. Yes, that part is already in the next draft (and the other text/* xml types) MM> I do believe that image/svg+xml should allow the charset parameter. So we still have some discussion to do. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group
participants (6)
-
Bjoern Hoehrmann
-
Boris Zbarsky
-
Chris Lilley
-
Doug Schepers
-
Jim Ley
-
MURATA Makoto