RE: Media Type "text/csv": new draft (-02) and Last Call

29 Mar 2005

      At 15:19 23/03/05 +0000, clyde.ingram@edl.uk.eds.com wrote:
...
Please clarify whether the trailing commas that your Excel export 
generates are there to mark the end of the last field, or to mark the 
start of a last field which currently has no value.
I'm not sure how to tell the difference in an Excel spreadsheet.

In the case where this arose for me, I had created a speadsheet with 
varying numbers of values in different rows, and many of the rows were 
output by Excel with *multiple* trailing commas.  Some rows were generated 
without any trailimng commas.  My point would be that if this happens with 
reasonable data then is must be permitted.  Whether it's interpreted as a 
field terminator as start of field with no value is, I think, moot.

#g
--
...
Graham,
-----Original Message-----
From: Graham Klyne 
[<mailto:GK-lists@ninebynine.org>mailto:GK-lists@ninebynine.org]
Sent: Wednesday, March 23, 2005 9:55 AM
To: Yakov Shafranovich; clyde.ingram@edl.uk.eds.com
Cc: ietf-types@alvestrand.no
Subject: Re: Media Type "text/csv": new draft (-02) and Last Call
At 01:14 23/03/05 -0500, Yakov Shafranovich wrote:
...
Clyde,
Thanks for pointing this out. I personally think that instead of making
the header record mandatory which is something that most CSV applications
do not have, I would rather take the comma out of the end of the record
and have the last field end with a CRLF instead of an optional COMMA. Do
you think that is a plausible solution?
No.  Some of the Excel data I process has trailing commas.  This must be
allowed.
I also don't think it's necessary to say anything (other than maybe as a
comment) about any special status for the first line:  such use is
accommodated quite reasonably within the basic CSV format.
For example, having such a line when exporting Excel as CSV depends
entirely upon how the user constructs the original spreadsheet.  Column
headings are common, but not mandatory.  In some cases, there may be a more
complex heading structure -- this is an application issue, not a dataset
format issue, and as such does not belong in the dataset format 
specification.
#g
--
------------
Please clarify whether the trailing commas that your Excel export 
generates are there to mark the end of the last field, or to mark the 
start of a last field which currently has no value.
To take a concrete example, I would expect a CSV of sibling relationships 
in a mythical family to look like this, assuming the siblings are one 
brother (Bart) and 2 sisters (Lisa & Maggie):
child,sisters,brothers<CR-LF>
    Bart,Lisa & Maggie,<CR-LF>
    Lisa,Maggie,Bart<CR-LF>
    Maggie,Lisa,Bart<CR-LF>
where the trailing comma for the record of child=Bart signifies that the 
"brothers" field is null, so that Bart has no brothers.  In my view this 
is a logical conclusion, and in fact stripping that one trailing comma 
would be an error, as that record would only have 2 fields, not 3.
Would you, however, expect the CSV file to use comma as a 
field-terminator, rather than a field-separator, as follows?:
child,sisters,brothers,<CR-LF>
    Bart,Lisa & Maggie,,<CR-LF>
    Lisa,Maggie,Bart,<CR-LF>
    Maggie,Lisa,Bart,<CR-LF>
Note that parsers that split data records on unprotected comma would 
detect one field too many in this latter case.
In a Comma SEPARATED Value file format, can you configure Excel to use 
comma as a SEPARATOR between values, rather than a TERMINATOR (at the end 
of values)?
Regarding your remarks on the header record being "an application issue, 
not a dataset format issue, and as such does not belong in the dataset 
format specification": XML, ASN.1, and other (application-independent) 
data interchange formats,  explicitly tag individual fields so that their 
type is unambiguously defined within a context.  In contrast, CSV conveys 
no tags per field in a data record.  Hence, to help with 
application-independent data interchange, the CSV format should convey 
field titles in a header record.
Here is an example of lack of application-independence: if my application 
sends yours this CSV file:
,Bart,Lisa & Maggie<CR-LF>
    Bart,Lisa,Maggie<CR-LF>
    Bart,Maggie,Lisa<CR-LF>
and your application depends on the assumption that the fields are the 
sequence:
child
    sisters
    brothers
then your application will mis-interpret the data.
But if my application precedes this with a header record, like so:
brothers,child,sisters<CR-LF>
    ,Bart,Lisa & Maggie<CR-LF>
    Bart,Lisa,Maggie<CR-LF>
    Bart,Maggie,Lisa<CR-LF>
then your application can maintain independence from the change by my 
application, because the CSV file conveys the corresponding new field 
sequence (the columns "brother" and "child" have swapped).
Regards,
Clyde
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact

Graham Klyne

tags

participants (1)