
Please review the Media Type registration template described below. It is available in HTML [1] or in plain text form [2] and relates to the SVG specification [3]. Registration of this Media Type in the standards tree is requested, in conformance with [4] and [5]. [1] http://www.w3.org/TR/SVG12/mimereg.html [2] http://www.w3.org/TR/SVG12/mimereg.html,text [3] http://www.w3.org/TR/SVG12/ [4] http://www.ietf.org/internet-drafts/draft-freed-media-type-reg-01.txt [5] http://www.w3.org/2002/06/registering-mediatype Registration of Media Type image/svg+xml MIME media type name: image MIME subtype name: svg+xml Required parameters: None. Optional parameters: None The encoding of an SVG document is determined by the XML encoding declaration. This has identical semantics to the application/xml media type in the case where the charset parameter is omitted, as specified in [6]RFC3023 sections 8.9, 8.10 and 8.11. [6] http://www.w3.org/TR/SVG12/references Encoding considerations: Same as for application/xml. See [7]RFC3023 , section 3.2. [7] http://www.w3.org/TR/SVG12/references Restrictions on usage: None Security considerations: As with other XML types and as noted in [8]RFC3023 section 10, repeated expansion of maliciously constructed XML entities can be used to consume large amounts of memory, which may cause XML processors in constrained environments to fail. [8] http://www.w3.org/TR/SVG12/references SVG documents may be transmitted in compressed form using gzip compression. For systems which employ MIME-like mechanisms, such as HTTP, this is indicated by the Content-Transfer-Encoding header; for systems which do not, such as direct filesystem access, this is indicated by the filename extension and by the Macintosh File Type Codes. In addition, gzip compressed content is readily recognised by the initial byte sequence as described in [9]RFC1952 section 2.3.1. [9] http://www.w3.org/TR/SVG12/references Several SVG elements may cause arbitrary URIs to be referenced. In this case, the security issues of [10]RFC2396, section 7, should be considered. [10] http://www.w3.org/TR/SVG12/references In common with HTML, SVG documents may reference external media such as images, audio, video, style sheets, and scripting languages. Scripting languages are executable content. In this case, the security considerations in the Media Type registrations for those formats apply. In addition, because of the extensibility features for SVG and of XML in general, it is possible that "image/svg+xml" may describe content that has security implications beyond those described here. However, if the processor follows only the normative semantics of this specification, this content will be outside the SVG namespace and will be ignored. Only in the case where the processor recognizes and processes the additional content, or where further processing of that content is dispatched to other processors, would security issues potentially arise. And in that case, they would fall outside the domain of this registration document. Interoperability considerations: This specification describes processing semantics that dictate behavior that must be followed when dealing with, among other things, unrecognized elements and attributes, both in the SVG namespace and in other namespaces. Because SVG is extensible, conformant "image/svg+xml" processors must expect that content received is well-formed XML, but it cannot be guaranteed that the content is valid to a particular DTD or Schema or that the processor will recognize all of the elements and attributes in the document. SVG has a published Test Suite and associated implementation report showing which implementations passed which tests at the time of the report. This information is periodically updated as new tests are added or as implementations improve. Published specification: This media type registration is extracted from Appendix G of the SVG 1.2 specification. Additional information: Person & email address to contact for further information: Dean Jackson, (dean@w3.org). Intended usage: COMMON Author/Change controller: The SVG specification is a work product of the World Wide Web Consortium's SVG Working Group. The W3C has change control over these specifications. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group

Sorry this is late. Here are some comments on this registration: - Some of the ideas discussed at http://eikenes.alvestrand.no/pipermail/ietf-types/2004-July/000298.html might help improve the specification. - The text below was produced by using an HTML-to-text conversion. Ideally, it should be possible to use the registration template directly as plain text, i.e. the most important URIs should appear in plain text. - On the other hand, there is no need to provide URIs for RFCs. Something like "... [6]RFC3023 ..." [6] http://www.w3.org/TR/SVG12/references in an IETF context, is quite confusing. - "Published specification: This media type registration is extracted from Appendix G of the SVG 1.2 specification." This sounds like it's the wrong way round. "Published specification" doesn't mean the specification of the registration template, but the specification of the format. This is in many ways the most important part of the mime registration. It tells a reader "if you get something of type image/svg+xml, here is the spec that tells you how to interprete that media type". Saying that the media type registration itself is in Appendix G is additional information worth mentioning, but much less important. - Last but not least, I agree with comments already made in this venue that this media type should allow the 'charset' parameter as defined in RFC 3023. As argued in detail at http://lists.w3.org/Archives/Public/www-tag/2004Nov/0034.html, I do not see any way to justify removing the 'charset' parameter based on 'good practice' advice in the Web Architecture document (http://www.w3.org/TR/2004/PR-webarch-20041105/#no-charset). (Boris Zbarsky made essentially the same argument, but much shorter.) Continuing to allow the 'charset' parameter (or in this case, putting it back in) will make it easier to use generic tools (both on the producer side (databases, java servlets,...) and on the client side (xml editors, validators,...), which is one of the main advantages of +xml. Regards, Martin. At 01:22 04/11/03, Chris Lilley wrote:
Please review the Media Type registration template described below. It is available in HTML [1] or in plain text form [2] and relates to the SVG specification [3]. Registration of this Media Type in the standards tree is requested, in conformance with [4] and [5].
[1] http://www.w3.org/TR/SVG12/mimereg.html [2] http://www.w3.org/TR/SVG12/mimereg.html,text [3] http://www.w3.org/TR/SVG12/ [4] http://www.ietf.org/internet-drafts/draft-freed-media-type-reg-01.txt [5] http://www.w3.org/2002/06/registering-mediatype
Registration of Media Type image/svg+xml
MIME media type name: image
MIME subtype name: svg+xml
Required parameters: None.
Optional parameters: None
The encoding of an SVG document is determined by the XML encoding declaration. This has identical semantics to the application/xml media type in the case where the charset parameter is omitted, as specified in [6]RFC3023 sections 8.9, 8.10 and 8.11.
[6] http://www.w3.org/TR/SVG12/references
Encoding considerations: Same as for application/xml. See [7]RFC3023 , section 3.2.
[7] http://www.w3.org/TR/SVG12/references
Restrictions on usage: None
Security considerations: As with other XML types and as noted in [8]RFC3023 section 10, repeated expansion of maliciously constructed XML entities can be used to consume large amounts of memory, which may cause XML processors in constrained environments to fail.
[8] http://www.w3.org/TR/SVG12/references
SVG documents may be transmitted in compressed form using gzip compression. For systems which employ MIME-like mechanisms, such as HTTP, this is indicated by the Content-Transfer-Encoding header; for systems which do not, such as direct filesystem access, this is indicated by the filename extension and by the Macintosh File Type Codes. In addition, gzip compressed content is readily recognised by the initial byte sequence as described in [9]RFC1952 section 2.3.1.
[9] http://www.w3.org/TR/SVG12/references
Several SVG elements may cause arbitrary URIs to be referenced. In this case, the security issues of [10]RFC2396, section 7, should be considered.
[10] http://www.w3.org/TR/SVG12/references
In common with HTML, SVG documents may reference external media such as images, audio, video, style sheets, and scripting languages. Scripting languages are executable content. In this case, the security considerations in the Media Type registrations for those formats apply.
In addition, because of the extensibility features for SVG and of XML in general, it is possible that "image/svg+xml" may describe content that has security implications beyond those described here. However, if the processor follows only the normative semantics of this specification, this content will be outside the SVG namespace and will be ignored. Only in the case where the processor recognizes and processes the additional content, or where further processing of that content is dispatched to other processors, would security issues potentially arise. And in that case, they would fall outside the domain of this registration document.
Interoperability considerations: This specification describes processing semantics that dictate behavior that must be followed when dealing with, among other things, unrecognized elements and attributes, both in the SVG namespace and in other namespaces.
Because SVG is extensible, conformant "image/svg+xml" processors must expect that content received is well-formed XML, but it cannot be guaranteed that the content is valid to a particular DTD or Schema or that the processor will recognize all of the elements and attributes in the document.
SVG has a published Test Suite and associated implementation report showing which implementations passed which tests at the time of the report. This information is periodically updated as new tests are added or as implementations improve.
Published specification: This media type registration is extracted from Appendix G of the SVG 1.2 specification.
Additional information:
Person & email address to contact for further information: Dean Jackson, (dean@w3.org).
Intended usage: COMMON
Author/Change controller: The SVG specification is a work product of the World Wide Web Consortium's SVG Working Group. The W3C has change control over these specifications.
-- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group

On Friday, November 19, 2004, 1:08:27 AM, Martin wrote: MD> Sorry this is late. Here are some comments on this registration: MD> - Some of the ideas discussed at MD> MD> http://eikenes.alvestrand.no/pipermail/ietf-types/2004-July/000298.html MD> might help improve the specification. Thanks, I will take a look. MD> - The text below was produced by using an HTML-to-text conversion. MD> Ideally, it should be possible to use the registration template MD> directly as plain text, i.e. the most important URIs should MD> appear in plain text. Thats certainly a possibility. Or the conversion could be trimmed by hand. i was trying to ensure that exactly the same text was in te W3C appendix and the registration. MD> - On the other hand, there is no need to provide URIs for RFCs. MD> Something like "... [6]RFC3023 ..." MD> [6] http://www.w3.org/TR/SVG12/references MD> in an IETF context, is quite confusing. Yes. MD> - "Published specification: MD> This media type registration is extracted from Appendix G of MD> the SVG 1.2 specification." MD> This sounds like it's the wrong way round. "Published specification" MD> doesn't mean the specification of the registration template, but MD> the specification of the format. Yes this could have ben better worded. I was trying to say "the published specification is SVG 1.2, of which this registration forms appendix G. MD> - Last but not least, I agree with comments already made in this venue MD> that this media type should allow the 'charset' parameter as defined MD> in RFC 3023. As argued in detail at MD> http://lists.w3.org/Archives/Public/www-tag/2004Nov/0034.html, MD> I do not see any way to justify removing the 'charset' parameter In that case, perhaps you could examine http://lists.w3.org/Archives/Public/www-tag/2004Oct/0016.html and argue against the points made there, saying why the approved TAG findings and the Architecture of the World Wide Web are incorrect. Its much more productive to fix RFC3023 to not be in conflict with Web Architecture which (as co-editor) i am involved in doing. MD> Continuing to allow the 'charset' parameter (or in this case, MD> putting it back in) Neither of those are correct. it is not a case of putting it back in, since it was never there; it is not a case of continuing to allow it, since it is not there. MD> will make it easier to use generic tools (both on the producer side MD> (databases, java servlets,...) and on the client side (xml editors, MD> validators,...), which is one of the main advantages of +xml. Actually no. The sole use case for the charset, given that in the revision of 3023 is required to be the same as what the xml encoding declaration says, is for *non* xml processors. XML processors will know how to read and interpret the xml encoding declaration.
Optional parameters: None
The encoding of an SVG document is determined by the XML encoding declaration. This has identical semantics to the application/xml media type in the case where the charset parameter is omitted, as specified in [6]RFC3023 sections 8.9, 8.10 and 8.11.
The cases stated there are entirely adequate for robust, interoperable encoding declaration and are widely, indeed ubiquitously, implemented. They can confidently be generated by all authoring tools without knowledge of the precise server configuration. Your proposal to add a redundant charset parameter merely complicates server setup, requires authoring tools to be specially configured to understand server setup, and requires xml instances to be rewritten wen saved to local disk. All this to benefit *non xml aware* processors which are going to save to local disk anyway - and if they do, and use the charset parameter, they still have to understand enough of the xml syntax to reliably rewrite the encoding declaration. It also results in xml that is not well formed sitting on the server, making it much harder to do server side processing. In conclusion your proposed addition increases complexity, decreases interoperability, is contrary to existing practice, and provides no benefit. I thus find it a less than compelling addition. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group

* Chris Lilley wrote:
Its much more productive to fix RFC3023 to not be in conflict with Web Architecture which (as co-editor) i am involved in doing.
If your goal is to feel good when ignoring reality then that may be so, if you are more concerned about interoperability of running code, which is slightly more common in IETF discussions, then maybe not. Your pro- posals so far render all applications for which RFC3023 would be rele- vant non-compliant regardless of whether they implement only some part of RFC3023, the entire specification or implement it not at all. That's not going to help interoperability much. Some of your proposals even seem out of scope of RFC3023bis as they in essence make it a fatal error as defined in XML 1.0 if higher-level protocol information affects the detection of the character encoding of the XML entity which contradicts XML 1.0. It rather seems you want to change the XML specifications in which case you should talk to the W3C XML Core Working Group. Of course, assuming that I actually understand your proposals, none of them has really been clear yet, you sometimes give the impression that the requirements your propose to add to RFC3023bis just render some content non-compliant without any actual effect on conforming implementations, that would then be even worse as it would have no effect in practise other than complicating theory even more. In fact, in that case one would even wonder what you are actually trying to achieve, as RFC 3023 already states Processors generating XML MIME entities MUST NOT label conflicting charset information between the MIME Content-Type and the XML declaration.
Actually no. The sole use case for the charset, given that in the revision of 3023 is required to be the same as what the xml encoding declaration says, is for *non* xml processors. XML processors will know how to read and interpret the xml encoding declaration.
It is not really acceptable to make the image/svg+xml registration dependant on a possible revision of some other document, especially if it seems unlikely that such a revision will pass IETF/IESG review which seems to be the case so far.
Your proposal to add a redundant charset parameter merely complicates server setup, requires authoring tools to be specially configured to understand server setup, and requires xml instances to be rewritten wen saved to local disk. All this to benefit *non xml aware* processors which are going to save to local disk anyway - and if they do, and use the charset parameter, they still have to understand enough of the xml syntax to reliably rewrite the encoding declaration. It also results in xml that is not well formed sitting on the server, making it much harder to do server side processing.
The point Martin and others are making is that image/svg+xml should not be different from all other XML MIME types, none of the arguments you cite is actually relevant to that case, they rather discuss why having a charset parameter in the first place is a bad idea for some formats. If your point is that image/svg+xml should be better than all the other types, do not use the +xml convention as that convention implies a charset parameter with well-defined processing semantics. Note that even with your proposals to change RFC3023 image/svg+xml would still be different from all other types if it does not define the semantics of the charset parameter for the type.

On Wednesday, November 24, 2004, 7:32:59 AM, Bjoern wrote: BH> * Chris Lilley wrote:
Its much more productive to fix RFC3023 to not be in conflict with Web Architecture which (as co-editor) i am involved in doing.
BH> If your goal is to feel good when ignoring reality then that may be so, BH> if you are more concerned about interoperability of running code, which BH> is slightly more common in IETF discussions, then maybe not. Bjoern, that seems uncalled for. Lets try to be civil here. I don't describe your proposals as fleeing from reality, please do me the same courtesy. And my reasoning in all of this is, of course, existing implementations and interoperability. BH> Your pro- BH> posals so far render all applications for which RFC3023 would be rele- BH> vant non-compliant regardless of whether they implement only some part BH> of RFC3023, the entire specification or implement it not at all. That is incorrect. You seem to miss that once RFC3023 is updated to ensure that the encoding and the charset are the same, the *sole* use of a charset parameter is for non-xml applications. BH> Some of your proposals even BH> seem out of scope of RFC3023bis as they in essence make it a fatal error BH> as defined in XML 1.0 if higher-level protocol information affects the BH> detection of the character encoding of the XML entity which contradicts BH> XML 1.0. It rather seems you want to change the XML specifications in BH> which case you should talk to the W3C XML Core Working Group. No, you mis-state my position. Incidentally, since you bring up the XML Core WG, you realise that the text I cited in the ASrchitecture document was written by Tim Bray, co-editor of the XML specification? BH> Of course, assuming that I actually understand your proposals, I suspect that you do not, which would explain both the content and the tone of your comments. BH> none of BH> them has really been clear yet, you sometimes give the impression that BH> the requirements your propose to add to RFC3023bis just render some BH> content non-compliant without any actual effect on conforming BH> implementations, that would then be even worse as it would have no BH> effect in practise other than complicating theory even more. In fact, BH> in that case one would even wonder what you are actually trying to BH> achieve, as RFC 3023 already states BH> Processors generating XML MIME entities MUST NOT label conflicting BH> charset information between the MIME Content-Type and the XML BH> declaration. And thus, what use is the redundant declaration? Note that currently there is great variability in what processors do when reading and saving content that has conflicting declarations. Removing the source of the conflict brings us onto safer ground to the area where all implementations behave the same. Surely you can see that this is good and results in more robust, interoperable behavior? I am frankly puzzled by your insistence on retaining the woolly non-interoperable area while simultaneously claiming that i am unrealistic and do not care for interoperability..... -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group

* Chris Lilley wrote:
BH> Your proposals so far render all applications for which RFC3023 would BH> be relevant non-compliant regardless of whether they implement only BH> some part of RFC3023, the entire specification or implement it not at BH> all.
That is incorrect. You seem to miss that once RFC3023 is updated to ensure that the encoding and the charset are the same, the *sole* use of a charset parameter is for non-xml applications.
The use of the charset parameter so far does not seem actually relevant to this discussion. If I understand correctly that you propose to make Content-Type: application/xml;charset=iso-8859-1 <?xml version="1.0" encoding="utf-8"?> ... iso-8859-1 content ... non-conforming, I would expect that your proposal includes a requirement for implementations to detect this error and probably also a requirement to reject such resources. Both requirements would render implementations of RFC3023 non-conforming since none of them do this. So it would seem your proposal does not include such requirements, is that correct? In case this is correct, what is the point of disallowing this and yet to specify how implementations must process such content (i.e., consider the resource above ISO-8859-1 encoded)? Or does your proposal also in- clude changes to how implementations must determine the character en- coding? It would seem it does not.
And thus, what use is the redundant declaration? Note that currently there is great variability in what processors do when reading and saving content that has conflicting declarations. Removing the source of the conflict brings us onto safer ground to the area where all implementations behave the same. Surely you can see that this is good and results in more robust, interoperable behavior?
What do you consider the source of the conflict here? As far as I can tell that would be the charset parameter and as long as implementations are required to consider it to detect the character encoding the source of conflict is not removed. Maybe your point is that people will stop using the charset parameter in a way that creates potential problems if RFC3023bis has some conformance rules they make that seem reasonable to expect? Maybe you can draft an improved proposal to change RFC3023 and post it to the relevant lists? That would certainly help the discussion.

On Wednesday, November 24, 2004, 3:18:50 PM, Bjoern wrote: BH> * Chris Lilley wrote:
BH> Your proposals so far render all applications for which RFC3023 would BH> be relevant non-compliant regardless of whether they implement only BH> some part of RFC3023, the entire specification or implement it not at BH> all.
That is incorrect. You seem to miss that once RFC3023 is updated to ensure that the encoding and the charset are the same, the *sole* use of a charset parameter is for non-xml applications.
BH> The use of the charset parameter so far does not seem actually relevant BH> to this discussion. If I understand correctly that you propose to make BH> Content-Type: application/xml;charset=iso-8859-1 BH> <?xml version="1.0" encoding="utf-8"?> BH> ... iso-8859-1 content ... BH> non-conforming, As you yourself pointed out, per RFC3023 Processors generating XML MIME entities MUST NOT label conflicting charset information between the MIME Content-Type and the XML declaration. such content is already non conforming. BH> I would expect that your proposal includes a requirement for BH> implementations to detect this error and probably also a requirement BH> to reject such resources. Since its already non conforming, we just have to deal with the non-conforming situation. My solution further discourages such content and is consistent with the architectural principle that silent recovery from error is harmful http://www.w3.org/TR/webarch/#no-silent-recovery In terms of dealing with such content if it still occurs, the XML well formedness rules already handle that in an entirely satisfactory manner and nothing further need be added. These are already well implemented and highly interoperable. BH> Both requirements would render implementations BH> of RFC3023 non-conforming since none of them do this. Thats right, they don't raise an error. They either silently recover by considering the charset authoritative or they silently recover by considering the encoding declaration authoritative. Both interpretations are observed in running code. In addition, when saving to disk some (a few) alter the inconsistent xml encoding declaration to make it correct, others (most) do not, thus depositing a non-well-formed instance in local storage. And you are ok with that messy, non-interoperable state of affairs?
And thus, what use is the redundant declaration? Note that currently there is great variability in what processors do when reading and saving content that has conflicting declarations. Removing the source of the conflict brings us onto safer ground to the area where all implementations behave the same. Surely you can see that this is good and results in more robust, interoperable behavior?
BH> What do you consider the source of the conflict here? As far as I can BH> tell that would be the charset parameter and as long as implementations BH> are required to consider it to detect the character encoding the source BH> of conflict is not removed. This is doubtless why RFC 3023 says that people should not generate content that has this conflict, and why TAG says the same thing, not to generate it unless it is known to be correct. However, implementation experience since RFC 3023 was issues shows that people do it anyway, and its a significant problem; so it needs to be fixed. Pretending there is no problem does not help the situation in any way. BH> Maybe your point is that people will stop BH> using the charset parameter in a way that creates potential problems if BH> RFC3023bis has some conformance rules they make that seem reasonable to BH> expect? Right. That would include discouraging the use of an optional and inconsistently implemented charset parameter for new registrations. Existing registrations that use it we can do nothing about, although I would like to see more testing of what processors do in the case of inconsistent, non-conforming content. BH> Maybe you can draft an improved proposal to change RFC3023 and post it BH> to the relevant lists? That would certainly help the discussion. Excellent suggestion, I will do exactly that (which is, of course, why I took the action from the TAG to become co-editor. Its much more effective for the TAG to say something is wrong, and help fix it, than merely to say it is wrong). -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group

* Chris Lilley wrote:
As you yourself pointed out, per RFC3023
Processors generating XML MIME entities MUST NOT label conflicting charset information between the MIME Content-Type and the XML declaration.
such content is already non conforming.
It does not actually apply to content...
In terms of dealing with such content if it still occurs, the XML well formedness rules already handle that in an entirely satisfactory manner and nothing further need be added. These are already well implemented and highly interoperable.
Consider a *UTF-8 encoded* document Content-Type: application/xml;charset=iso-8859-1 <?xml version="1.0"?> ... <!--Björn--> ... With no BOM and using only US-ASCII characters for the rest of the document, with your proposal, which of the following behaviors of implementations would be considered conforming? a) it fails to process the document due to RFC3023bis/XML 1.0 errors b) it considers the comment to include "Björn" c) it considers the comment to include "Björn" If none of these behaviors would be conforming, what would be conforming instead? What would be the answers for your proposal if application/xml is replaced by the following types and proper content as defined above: * application/xhtml+xml (with no update to RFC3236) * image/svg+xml (as you propose it) For application/xml / application/xhtml+xml this would currently be b) as the document includes 0xC3 0xB6 and the encoding is determined to be ISO-8859-1 which means the sequence above represents "ö".

On Wednesday, November 24, 2004, 4:41:48 PM, Bjoern wrote: BH> * Chris Lilley wrote:
As you yourself pointed out, per RFC3023
Processors generating XML MIME entities MUST NOT label conflicting charset information between the MIME Content-Type and the XML declaration.
such content is already non conforming.
BH> It does not actually apply to content... Yes, that is an ambiguity that needs to be cleared up. It says things that generate the content must not do that; if that were true, then there would be no such content. Since there is, its worth splitting this into two: - Conformance for XML generators - Conformance for XML messages (headers plus bodies)
In terms of dealing with such content if it still occurs, the XML well formedness rules already handle that in an entirely satisfactory manner and nothing further need be added. These are already well implemented and highly interoperable.
BH> Consider a *UTF-8 encoded* document BH> Content-Type: application/xml;charset=iso-8859-1 Since that isn't image/svg+xml then it has a charset parameter, although the processor that generated it is non conforming to the existing RFC 3023. But lets press on into how to detect or resolve the error. BH> <?xml version="1.0"?> BH> ... BH> <!--Björn--> BH> ... BH> With no BOM and using only US-ASCII characters for the rest of the BH> document, Cleverly constructed example, if the processor believes the charset the processor will think the comment says Björn. However, as soon as you save it, your name is mis-spelled. I'm sure you would not like that, Björn. So in this case, although the processor that generated it is non conforming, the content is not non conforming (but it should be) and the processor that receives it has two possibilities: a) it can add the missing encoding declaration when processing and when saving to disk (note that, if the xml happened to be digitally signed and in canonical XML form, this would break the signature). See RFC 3741 b) it can note that a required encoding declaration is not present, and throw a well formedness error. Note that both of these choices will break some content and both of these choices are licensed by the relevant specifications. There is thus non-interoperability. Note further that, in the case where the charset parameter is not present, there is 100% interoperability, no breakage, all in conformance with the existing clauses in RFC 3023 which 3023bis will retain, since they are proven by implementation experience with running code to be highly robust and interoperable. So, lets take the other case, which is more interesting. Consider an *8859-1 encoded* document Content-Type: application/xml;charset=UTF-8 <?xml version="1.0"?> ... <!--Björn--> ... With your proposal, would the well formedness error (bytes occur that cannot occur in UTF-8) be silently recovered from if the HTTP header overrides it, even for an XML processor, while it would continue to fail in other cases (such as server side processing)? BH> with your proposal, which of the following behaviors of BH> implementations would be considered conforming? (see above for discussion of b and c) BH> a) it fails to process the document due to RFC3023bis/XML 1.0 errors That would be the safest course. Consider if the non-ascii character was a euro or some other currency symbol, if the document was an invoice, and was being processed by an accounting system not by a human being. Accounting systems do not have the luxury of a human to look at the invoice, go to View...Character Encoding and try various possibilities until it seems to look right, then save the document and edit the local copy and fix up the encoding declaration BH> b) it considers the comment to include "Björn" BH> c) it considers the comment to include "Björn" BH> * application/xhtml+xml (with no update to RFC3236) That is an existing type and has an existing charset parameter. Applications are thus allowed to use it, with all the complications and breakage that this entails as described above. BH> * image/svg+xml (as you propose it) There is no charset parameter. Processors that generate one and messages that contain one are in error. BH> For application/xml / application/xhtml+xml this would currently be b) BH> as the document includes 0xC3 0xB6 and the encoding is determined to be BH> ISO-8859-1 which means the sequence above represents "ö". It would sometimes be b) and sometimes c) depending on the particular software and whether its reading from disk on the server or over the net. I frankly can't understand how you consider this lack of interoperability to be a desirable thing. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group Member, W3C Technical Architecture Group

* Chris Lilley wrote:
BH> Consider a *UTF-8 encoded* document BH> BH> Content-Type: application/xml;charset=iso-8859-1 BH> BH> <?xml version="1.0"?> BH> ... BH> <!--Björn--> BH> ... BH> BH> With no BOM and using only US-ASCII characters for the rest of the BH> document,
So in this case, although the processor that generated it is non conforming, the content is not non conforming (but it should be) and the processor that receives it has two possibilities:
I've actually asked to get a better understanding on how you intend to change RFC3023, yet I am afraid you did not really say what happens with the document above if RFC3023bis gets approved with your changes. I would appreciate to know just a, b, c, or what else for the various cases.
b) it can note that a required encoding declaration is not present, and throw a well formedness error.
It actually can't, 0xC3 0xB6 is a legal sequence in both UTF-8 and ISO-8859-1, it would need to know that I meant to have "Björn" in the comment which it cannot know.
Consider an *8859-1 encoded* document
Content-Type: application/xml;charset=UTF-8
<?xml version="1.0"?> ... <!--Björn--> ...
With your proposal, would the well formedness error (bytes occur that cannot occur in UTF-8) be silently recovered from if the HTTP header overrides it, even for an XML processor, while it would continue to fail in other cases (such as server side processing)?
I do not really think I've made a proposal to change RFC3023 other than that the differences between text/xml and application/xml are removed to properly reflect running code. I can only tell you what XML 1.0 and RFC 3023 require in these cases but you know that already.
It would sometimes be b) and sometimes c) depending on the particular software and whether its reading from disk on the server or over the net. I frankly can't understand how you consider this lack of interoperability to be a desirable thing.
I am in fact most interested to learn how you think this can be improved upon which is why I asked you about the impact of your proposal for the various cases I've mentioned. What applications currently do does not really help me to get a better understanding of that.
participants (3)
-
Bjoern Hoehrmann
-
Chris Lilley
-
Martin Duerst