Next Previous Up Contents
Next: Comma-Separated Values
Up: Supplied Input Handlers
Previous: Feather

3.6.10 ASCII

In many cases tables are stored in some sort of unstructured plain text format, with cells separated by spaces or some other delimiters. There is a wide variety of such formats depending on what delimiters are used, how columns are identified, whether blank values are permitted and so on. It is impossible to cope with them all, but the ASCII handler attempts to make a good guess about how to interpret a given ASCII file as a table, which in many cases is successful. In particular, if you just have columns of numbers separated by something that looks like spaces, you should be just fine.

Here are the detailed rules for how the ASCII-format tables are interpreted:

If the list of rules above looks frightening, don't worry, in many cases it ought to make sense of a table without you having to read the small print. Here is an example of a suitable ASCII-format table:

    #
    # Here is a list of some animals.
    #
    # RECNO  SPECIES         NAME         LEGS   HEIGHT/m
      1      pig             "Pigling Bland"  4  0.8
      2      cow             Daisy        4      2
      3      goldfish        Dobbin       ""     0.05
      4      ant             ""           6      0.001
      5      ant             ""           6      0.001
      6      ant             ''           6      0.001
      7      "queen ant"     'Ma\'am'     6      2e-3
      8      human           "Mark"       2      1.8
In this case it will identify the following columns:
    Name       Type
    ----       ----
    RECNO      Short
    SPECIES    String
    NAME       String
    LEGS       Short
    HEIGHT/m   Float
It will also use the text "Here is a list of some animals" as the Description parameter of the table. Without any of the comment lines, it would still interpret the table, but the columns would be given the names col1..col5.

The handler behaviour may be modified by specifying one or more comma-separated name=value configuration options in parentheses after the handler name, e.g. "ascii(maxSample=5000)". The following options are available:

maxSample = <int>
Controls how many rows of the input file are sampled to determine column datatypes. When reading ASCII files, since no type information is present in the input file, the handler has to look at the column data to see what type of value appears to be present in each column, before even starting to read the data in. By default it goes through the whole table when doing this, which can be time-consuming for large tables. If this value is set, it limits the number of rows that are sampled in this data characterisation pass, which can reduce read time substantially. However, if values near the end of the table differ in apparent type from those near the start, it can also result in getting the datatypes wrong.

This format cannot be automatically identified by its content, so in general it is necessary to specify that a table is in ASCII format when reading it. However, if the input file has the extension ".txt" (case insensitive) an attempt will be made to read it using this format.

The handler class for this format is AsciiTableBuilder.


Next Previous Up Contents
Next: Comma-Separated Values
Up: Supplied Input Handlers
Previous: Feather

STIL - Starlink Tables Infrastructure Library
Starlink User Note252
STIL web page: http://www.starlink.ac.uk/stil/
Author email: m.b.taylor@bristol.ac.uk