Generic Table Resource Input

Next Previous Up Contents
Next: Generic Table Streamed Input
Up: Table I/O
Previous: Extensible I/O framework

3.2 Generic Table Resource Input

This section describes the usual way of reading a table or tables from an external resource such as a file, URL, DataSource etc, and converting it into a StarTable object whose data and metadata you can examine as described in Section 2. These resources have in common that the data from them can be read more than once; this is necessary in general since depending on the data format and intended use it may require more than one pass to provide the table data. Reading a table in this way may or may not require local resources such as memory or disk, depending on how the handler works - see Section 4 for information on how to influence such resource usage.

The main class used to read a table in this way is StarTableFactory. The job of this class is to keep track of which input handlers are registered and to use one of them to read data from an input stream and turn it into one or more StarTables. The basic rule is that you use one of the StarTableFactory's makeStarTable or makeStarTables methods to turn what you've got (e.g. String, URL, DataSource) into a StarTable or a TableSequence (which represents a collection of StarTables) and away you go. If no StarTable can be created (for instance because the file named doesn't exist, or because it is not in any of the supported formats) then some sort of IOException or TableFormatException will be thrown. Note that if the byte stream from the target resource is compressed in one of the supported formats (gzip, bzip2, Unix compress) it will be uncompressed automatically (the work for this is done by the DataSource class).

There are two distinct modes in which StarTableFactory can work: automatic format detection and named format.

In automatic format detection mode, the type of data contained in an input stream is determined by looking at it. What actually happens is that the factory hands the stream to each of the handlers in its default handler list in turn, and the first one that recognises the format (usually based on looking at the first few bytes) attempts to make a table from it. If this fails, a handler may be identified by looking at the file name, if available (e.g. a filename or URL ending in ".csv" will be tried as a CSV file). In this mode, you only need to specify the table location, like this:

    public StarTable loadTable( File file ) throws IOException {
        return new StarTableFactory().makeStarTable( file.toString() );
    }

This mode is available for formats such as FITS, VOTable, ECSV, PDS4, Parquet, MRT, Feather and CDF that can be easily recognised, but is not reliable for text-based formats such as comma-separated values without recognisable filenames. You can access and modify the list of auto-detecting handlers using the getDefaultBuilders method. By default it contains only handlers for VOTable, CDF, FITS-like formats, ECSV, PDS4, Parquet, MRT, Feather and GBIN.

In named format mode, you have to specify the name of the format as well as the table location. This can be solicited from the user if it's not known at build time; the known format names can be got from the getKnownFormats method. The list of format handlers that can be used in this way can be accessed or modified using the getKnownBuilders method; it usually contains all the ones in the default handler list, but doesn't have to. Table construction in named format mode might look like this:

    public StarTable loadFitsTable( File file ) throws IOException {
        return new StarTableFactory().makeStarTable( file.toString(), "fits" );
    }

This format also offers the possibility of configuring input handler options in the handler name.

If the table format is known at build time, you can alternatively use the makeStarTable method of the appropriate format-specific TableBuilder. For instance you could replace the above example with this:


        return new FitsTableBuilder()
              .makeStarTable( DataSource.makeDataSource( file.toString() ),
                              false, StoragePolicy.getDefaultPolicy() );

This slightly more obscure method offers more configurability but has much the same effect; it may be slightly more efficient and may offer somewhat more definite error messages in case of failure. The various supplied TableBuilders (format-specific input handlres) are listed in Section 3.6.

The javadocs detail variations on these calls. If you want to ensure that the table you get provides random access (see Section 2.3), you should do something like this:

    public StarTable loadRandomTable( File file ) throws IOException {
        StarTableFactory factory = new StarTableFactory();
        factory.setRequireRandom( true );
        StarTable table = factory.makeStarTable( file.toString() );
        return table;
    }

Setting the requireRandom flag on the factory ensures that any table returned from its makeStarTable methods returns true from its isRandom method. (Note prior to STIL version 2.1 this flag only provided a hint to the factory that random tables were wanted - now it is enforced.)

Next Previous Up Contents
Next: Generic Table Streamed Input
Up: Table I/O
Previous: Extensible I/O framework

STIL - Starlink Tables Infrastructure Library
Starlink User Note252
STIL web page: http://www.starlink.ac.uk/stil/
Author email: m.b.taylor@bristol.ac.uk