Media Type "text/csv": new draft (-02) and Last Call

In section "2.Definition of the CSV format", items 3 & 4 state: 3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and will usually contain the same number of fields as the records in the rest of the file. For example: field_name,field_name,field_name CRLF aaa,bbb,ccc CRLF zzz,yyy,xxx CRLF 4. Within the header and each record there may be one or more fields, delimited by commas. The last field in the record may or may not be followed by a comma. For example: aaa,bbb,ccc Why would you permit the last field in the record to be followed by a comma? If a CSV record comprises: aaa,,ccc,ddd,,CRLF does it have 6 fields or 5? If comma is a field separator only, there are 6 fields: 1. aaa 2. <null> 3. ccc 4. ddd 5. <null> 6. <null> But if the comma is also a mandatory terminator for the last field (effectively the record separator becomes comma-CRLF), then there are 5 fields. In my view, permitting the last field to end with comma leads to ambiguity, and prevents an application from checking that an exact number of fields is present. The only way to guarantee the exact number of fields is then to count the fields in the header. But then your item 3 allows the header record to be omitted. Would it not be safer to make the header record mandatory? Thank-you and regards, Clyde Ingram

Clyde, Thanks for pointing this out. I personally think that instead of making the header record mandatory which is something that most CSV applications do not have, I would rather take the comma out of the end of the record and have the last field end with a CRLF instead of an optional COMMA. Do you think that is a plausible solution? Yakov clyde.ingram@edl.uk.eds.com wrote:
In section "2.Definition of the CSV format", items 3 & 4 state:
3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and will usually contain the same number of fields as the records in the rest of the file. For example:
field_name,field_name,field_name CRLF aaa,bbb,ccc CRLF zzz,yyy,xxx CRLF 4. Within the header and each record there may be one or more fields, delimited by commas. The last field in the record may or may not be followed by a comma. For example:
aaa,bbb,ccc
Why would you permit the last field in the record to be followed by a comma? If a CSV record comprises:
aaa,,ccc,ddd,,CRLF
does it have 6 fields or 5? If comma is a field separator only, there are 6 fields:
1. aaa 2. <null> 3. ccc 4. ddd 5. <null> 6. <null>
But if the comma is also a mandatory terminator for the last field (effectively the record separator becomes comma-CRLF), then there are 5 fields.
In my view, permitting the last field to end with comma leads to ambiguity, and prevents an application from checking that an exact number of fields is present. The only way to guarantee the exact number of fields is then to count the fields in the header. But then your item 3 allows the header record to be omitted.
Would it not be safer to make the header record mandatory?
Thank-you and regards, Clyde Ingram

At 01:14 23/03/05 -0500, Yakov Shafranovich wrote:
Clyde,
Thanks for pointing this out. I personally think that instead of making the header record mandatory which is something that most CSV applications do not have, I would rather take the comma out of the end of the record and have the last field end with a CRLF instead of an optional COMMA. Do you think that is a plausible solution?
No. Some of the Excel data I process has trailing commas. This must be allowed. I also don't think it's necessary to say anything (other than maybe as a comment) about any special status for the first line: such use is accommodated quite reasonably within the basic CSV format. For example, having such a line when exporting Excel as CSV depends entirely upon how the user constructs the original spreadsheet. Column headings are common, but not mandatory. In some cases, there may be a more complex heading structure -- this is an application issue, not a dataset format issue, and as such does not belong in the dataset format specification. #g --
Yakov
clyde.ingram@edl.uk.eds.com wrote:
In section "2.Definition of the CSV format", items 3 & 4 state: 3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and will usually contain the same number of fields as the records in the rest of the file. For example: field_name,field_name,field_name CRLF aaa,bbb,ccc CRLF zzz,yyy,xxx CRLF 4. Within the header and each record there may be one or more fields, delimited by commas. The last field in the record may or may not be followed by a comma. For example: aaa,bbb,ccc Why would you permit the last field in the record to be followed by a comma? If a CSV record comprises: aaa,,ccc,ddd,,CRLF does it have 6 fields or 5? If comma is a field separator only, there are 6 fields: 1. aaa 2. <null> 3. ccc 4. ddd 5. <null> 6. <null> But if the comma is also a mandatory terminator for the last field (effectively the record separator becomes comma-CRLF), then there are 5 fields. In my view, permitting the last field to end with comma leads to ambiguity, and prevents an application from checking that an exact number of fields is present. The only way to guarantee the exact number of fields is then to count the fields in the header. But then your item 3 allows the header record to be omitted. Would it not be safer to make the header record mandatory? Thank-you and regards, Clyde Ingram
------------ Graham Klyne For email: http://www.ninebynine.org/#Contact

On Wednesday, March 23, 2005, 7:14:24 AM, Yakov wrote: YS> Clyde, YS> Thanks for pointing this out. I personally think that instead of making YS> the header record mandatory which is something that most CSV YS> applications do not have, I would rather take the comma out of the end YS> of the record and have the last field end with a CRLF instead of an YS> optional COMMA. Do you think that is a plausible solution? Its a much better solution, because it allows missing values in the last field to be unambiguously signalled. 23,45,,67,91 clearly has a missing value in the third field but if 23,45,63,67,91 23,45,63,67,91, are equivalent then there is an ambiguity Is there a survey of common use of csv files? How many of them would be conformant? YS> Yakov YS> clyde.ingram@edl.uk.eds.com wrote:
In section "2.Definition of the CSV format", items 3 & 4 state:
3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and will usually contain the same number of fields as the records in the rest of the file. For example:
field_name,field_name,field_name CRLF aaa,bbb,ccc CRLF zzz,yyy,xxx CRLF 4. Within the header and each record there may be one or more fields, delimited by commas. The last field in the record may or may not be followed by a comma. For example:
aaa,bbb,ccc
Why would you permit the last field in the record to be followed by a comma? If a CSV record comprises:
aaa,,ccc,ddd,,CRLF
does it have 6 fields or 5? If comma is a field separator only, there are 6 fields:
1. aaa 2. <null> 3. ccc 4. ddd 5. <null> 6. <null>
But if the comma is also a mandatory terminator for the last field (effectively the record separator becomes comma-CRLF), then there are 5 fields.
In my view, permitting the last field to end with comma leads to ambiguity, and prevents an application from checking that an exact number of fields is present. The only way to guarantee the exact number of fields is then to count the fields in the header. But then your item 3 allows the header record to be omitted.
Would it not be safer to make the header record mandatory?
Thank-you and regards, Clyde Ingram
-- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group W3C Graphics Activity Lead

Chris Lilley wrote:
Is there a survey of common use of csv files? How many of them would be conformant?
Unfortunatly there is no single definition or survey of CSV files. My document is really concentrated on the MIME type with the definition included as an attempt of arriving at a tight common definition. Yakov
participants (4)
-
Chris Lilley
-
clyde.ingram@edl.uk.eds.com
-
Graham Klyne
-
Yakov Shafranovich