RE: Media Type "text/csv": new draft (-02) and Last Call

At 15:19 23/03/05 +0000, clyde.ingram@edl.uk.eds.com wrote:
Please clarify whether the trailing commas that your Excel export generates are there to mark the end of the last field, or to mark the start of a last field which currently has no value.
I'm not sure how to tell the difference in an Excel spreadsheet. In the case where this arose for me, I had created a speadsheet with varying numbers of values in different rows, and many of the rows were output by Excel with *multiple* trailing commas. Some rows were generated without any trailimng commas. My point would be that if this happens with reasonable data then is must be permitted. Whether it's interpreted as a field terminator as start of field with no value is, I think, moot. #g --
Graham,
-----Original Message----- From: Graham Klyne [<mailto:GK-lists@ninebynine.org>mailto:GK-lists@ninebynine.org] Sent: Wednesday, March 23, 2005 9:55 AM To: Yakov Shafranovich; clyde.ingram@edl.uk.eds.com Cc: ietf-types@alvestrand.no Subject: Re: Media Type "text/csv": new draft (-02) and Last Call
At 01:14 23/03/05 -0500, Yakov Shafranovich wrote:
Clyde,
Thanks for pointing this out. I personally think that instead of making the header record mandatory which is something that most CSV applications do not have, I would rather take the comma out of the end of the record and have the last field end with a CRLF instead of an optional COMMA. Do you think that is a plausible solution?
No. Some of the Excel data I process has trailing commas. This must be allowed.
I also don't think it's necessary to say anything (other than maybe as a comment) about any special status for the first line: such use is accommodated quite reasonably within the basic CSV format.
For example, having such a line when exporting Excel as CSV depends entirely upon how the user constructs the original spreadsheet. Column headings are common, but not mandatory. In some cases, there may be a more complex heading structure -- this is an application issue, not a dataset format issue, and as such does not belong in the dataset format specification.
#g -- ------------
Please clarify whether the trailing commas that your Excel export generates are there to mark the end of the last field, or to mark the start of a last field which currently has no value.
To take a concrete example, I would expect a CSV of sibling relationships in a mythical family to look like this, assuming the siblings are one brother (Bart) and 2 sisters (Lisa & Maggie):
child,sisters,brothers<CR-LF> Bart,Lisa & Maggie,<CR-LF> Lisa,Maggie,Bart<CR-LF> Maggie,Lisa,Bart<CR-LF>
where the trailing comma for the record of child=Bart signifies that the "brothers" field is null, so that Bart has no brothers. In my view this is a logical conclusion, and in fact stripping that one trailing comma would be an error, as that record would only have 2 fields, not 3.
Would you, however, expect the CSV file to use comma as a field-terminator, rather than a field-separator, as follows?:
child,sisters,brothers,<CR-LF> Bart,Lisa & Maggie,,<CR-LF> Lisa,Maggie,Bart,<CR-LF> Maggie,Lisa,Bart,<CR-LF>
Note that parsers that split data records on unprotected comma would detect one field too many in this latter case.
In a Comma SEPARATED Value file format, can you configure Excel to use comma as a SEPARATOR between values, rather than a TERMINATOR (at the end of values)?
Regarding your remarks on the header record being "an application issue, not a dataset format issue, and as such does not belong in the dataset format specification": XML, ASN.1, and other (application-independent) data interchange formats, explicitly tag individual fields so that their type is unambiguously defined within a context. In contrast, CSV conveys no tags per field in a data record. Hence, to help with application-independent data interchange, the CSV format should convey field titles in a header record.
Here is an example of lack of application-independence: if my application sends yours this CSV file:
,Bart,Lisa & Maggie<CR-LF> Bart,Lisa,Maggie<CR-LF> Bart,Maggie,Lisa<CR-LF>
and your application depends on the assumption that the fields are the sequence:
child sisters brothers
then your application will mis-interpret the data. But if my application precedes this with a header record, like so:
brothers,child,sisters<CR-LF> ,Bart,Lisa & Maggie<CR-LF> Bart,Lisa,Maggie<CR-LF> Bart,Maggie,Lisa<CR-LF>
then your application can maintain independence from the change by my application, because the CSV file conveys the corresponding new field sequence (the columns "brother" and "child" have swapped).
Regards, Clyde
------------ Graham Klyne For email: http://www.ninebynine.org/#Contact
participants (1)
-
Graham Klyne