Next Previous Up Contents
Next: ecsv
Up: Input Formats
Previous: cdf
Comma-separated value ("CSV") format is a common semi-standard
text-based format in which fields are delimited by commas.
Spreadsheets and databases are often able to export data in some
variant of it. The intention is to read tables in
the version of the format spoken by MS Excel amongst other applications,
though the documentation
on which it was based was not obtained directly from Microsoft.
The rules for data which it understands are as follows:
- Each row must have the same number of comma-separated fields.
- Whitespace (space or tab) adjacent to a comma is ignored.
- Adjacent commas, or a comma at the start or end of a line
(whitespace apart) indicates a null field.
- Lines are terminated by any sequence of carriage-return or newline
characters ('\r' or '\n')
(a corollary of this is that blank lines are ignored).
- Cells may be enclosed in double quotes; quoted values may contain
linebreaks (or any other character); a double quote character within
a quoted value is represented by two adjacent double quotes.
- The first line may be a header line containing column names
rather than a row of data. Exactly the same syntactic rules are
followed for such a row as for data rows.
Note that you can not use a "#
" character
(or anything else) to introduce "comment" lines.
Because the CSV format contains no metadata beyond column names,
the handler is forced to guess the datatype of the values in each column.
It does this by reading the whole file through once and guessing
on the basis of what it has seen. This has the disadvantages:
- Sometimes it guesses a different type than what you want
(e.g. 32-bit integer rather than 64-bit integer)
- It's slow to read.
This means that CSV is not generally recommended if you can
use another format instead.
If you're stuck with a large CSV file that's misbehaving or slow
to use, one possibility is to turn it into an ECSV file
file by adding some header lines by hand.
This format cannot be automatically identified
by its content, so in general it is necessary
to specify that a table is in
CSV
format when reading it.
However, if the input file has
the extension ".csv
" (case insensitive)
an attempt will be made to read it using this format.
An example looks like this:
RECNO,SPECIES,NAME,LEGS,HEIGHT,MAMMAL
1,pig,Pigling Bland,4,0.8,true
2,cow,Daisy,4,2.0,true
3,goldfish,Dobbin,,0.05,false
4,ant,,6,0.001,false
5,ant,,6,0.001,false
6,queen ant,Ma'am,6,0.002,false
7,human,Mark,2,1.8,true
See also ECSV as a format which is
similar and capable of storing more metadata.
Next Previous Up Contents
Next: ecsv
Up: Input Formats
Previous: cdf
STILTS - Starlink Tables Infrastructure Library Tool Set
Starlink User Note256
STILTS web page:
http://www.starlink.ac.uk/stilts/
Author email:
m.b.taylor@bristol.ac.uk
Mailing list:
topcat-user@jiscmail.ac.uk