Starlink User Note
252
Mark Taylor
October 2004
$Id: sun252.xml,v 1.20 2004/10/21 10:17:59 mbt Exp $
STIL is a set of Java class libraries which allow input, manipulation and output of tabular data and metadata. Among its key features are support for many tabular formats (including VOTable, FITS, text-based formats and SQL databases) and support for dealing with very large tables in limited memory.
As well as an abstract and format-independent definition of what constitutes a table, and an extensible framework for "pull-model" table processing, it provides a number of format-specific handlers which know how to serialize/deserialize tables. The framework for interaction between the core table manipulation facilities and the format-specific handlers is open and pluggable, so that handlers for new formats can easily be added, programmatically or at run-time.
The VOTable handling in particular is provided by classes which perform efficient XML parsing and can read and write VOTables in any of the defined formats (TABLEDATA, BINARY or FITS); at time of writing it is believed to be the only library to offer this. It may be used on its own for VOTable I/O without much reference to the format-independent parts of the library.
STIL is a set of class libraries for the input, output and manipulation of tables. It has been developed for use with astronomical tables, though it could be used for any kind of tabular data. It has no "native" external table format. What it has is a model of what a table looks like, a set of java classes for manipulating such tables, an extensible framework for table I/O, and a number of format-specific I/O handlers for dealing with several known table formats.
This document is a programmers' overview of the abilities of the STIL libraries, including some tutorial explanation and example code. Some parts of it may also be useful background reading for users of applications built on STIL. Exhaustive descriptions of all the classes and methods are not given here; that information can be found in the javadocs, which should be read in conjunction with this document if you are actually using these libraries. Much of the information here is repeated in the javadocs. The hypertext version of this document links to the relevant places in the javadocs where appropriate. The latest released version of this document in several formats can be found at http://www.starlink.ac.uk/stil/.
In words, STIL's idea of what constitutes a table is something which has the following:
StarTable
interface,
which is described in the next section.
It maps quite closely, though not exactly, onto the table model
embodied in the VOTable definition, which
itself owes a certain amount to FITS tables. This is not coincidence.
The most fundamental type in the STIL package is
uk.ac.starlink.table.StarTable
;
any time you are using a table, you will use an object which
implements this interface.
A few items of the table metadata
(name,
URL)
are available directly as values from the StarTable
interface.
A general parameter mechanism is provided for storing other items,
for instance user-defined ones.
The getParameters
method returns a list of
DescribedValue
objects which contain a scalar or array value and some metadata describing it
(name, units,
Unified Content Descriptor).
This list can be read or altered as required.
The StarTable
interface also contains
the methods
getColumnCount
and getRowCount
to determine the shape of the table. Note however that for tables
with sequential-only access, it may not be possible to ascertain
the number of rows - in this case getRowCount
will return -1.
Random-access tables (see Section 2.3) will always return
a positive row count.
Each column in a StarTable
is assumed to contain the
same sort of thing. More specifically, for each table column there is
a ColumnInfo
object
associated with each column which holds metadata describing
the values contained in that column
(the value associated with that column for each row in the table).
A ColumnInfo
contains information about the
name,
units,
UCD,
class
etc of a column, as well as a mechanism for storing additional
('auxiliary') user-defined metadata.
It also provides methods for rendering the values in the column
under various circumstances.
The class associated with a column, obtained from the
getContentClass
method,
is of particular importance. Every object in the column described
by that metadata should be an instance of the
Class
that
getContentClass
returns
(or of one of its subtypes), or null
.
There is nothing in the tables infrastructure which can enforce
this, but a table which doesn't follow this rule is considered broken,
and application code is within its rights to behave unpredictably
in this case.
Such a broken table might result from a bug in the I/O handler used
to obtain the table in the first place, or a badly formed table that
it has read, or a bug in one of the wrapper classes upstream from
the table instance being used.
Because of the extensible nature of the infrastructure,
such bugs are not necessarily STIL's fault.
Any (non-primitive) class can be used
but most table I/O handlers can only
cope with certain types of value - typically the primitive wrapper classes
(numeric ones like
Integer
,
Double
and
Boolean
) and
String
s, so these
are the most important ones to deal with. The contents of a table
cell must always (as far as the access methods are concerned) be
an Object
or
null
, so primitive values cannot be used directly.
The general rule for primitive-like (numeric or boolean) values
is that a scalar should be represented by the appropriate wrapper class
(Integer
,
Float
,
Boolean
etc)
and an array by an array of primitives
(int[]
,
float[]
,
boolean[]
etc).
Non-primitive-like objects
(of which String
is the most important example)
should be represented by their own class (for scalars) or an array
of their own class (for arrays).
Note that it is not recommended to use multidimensional arrays
(i.e. arrays of arrays like int[][]
);
a 1-dimensional Java array should be used, and information about
the dimensionality should be stored in the ColumnInfo
's
shape attribute.
Thus to store a 3x2 array of integers, a 6-element array of type
int[]
would be used, and the ColumnInfo
's
getShape
method would return a two-element array (3,2)
.
The actual data values in a table are considered to be a sequence
of rows, each containing one value for each of the table's columns.
As explained above, each such value
is an Object
, and information about its class
(as well as semantic metadata) is available from the column's
ColumnInfo
object.
StarTable
s come in two flavours,
random-access and sequential-only;
you can tell which one a given table is by using its
isRandom
method, and how its data can be accessed is determined by this.
In either case, most of the data access methods are declared to
throw an IOException
to signal any data access error.
It is always possible to access a table's data sequentially,
that is starting with the first row and reading forward a row at
a time to the last row; it may or may not be possible to tell in advance
(using getRowCount
)
how many rows there are.
To perform sequential access, use the
getRowSequence
method to get a
RowSequence
object,
which is an iterator
over the rows in the table.
The RowSequence
's
next
method moves forward a row without returning any data;
to obtain the data use either
getCell
or getRow
;
the relative efficiencies of these depend on the implementation, but
in general if you want all or nearly all of the values in a row
it is a good idea to use getRow
, if you just want one
or two use getCell
.
You cannot move the iterator backwards.
When obtained, a RowSequence
is positioned before the
first row in the table,
so (unlike an Iterator
)
it is necessary to call next
before the first row is accessed.
Here is an example of how to sum the values in one of the numeric columns
of a table. Since only one value is required from each row,
getCell
is used:
double sumColumn( StarTable table, int icol ) throws IOException { // Check that the column contains values that can be cast to Number. ColumnInfo colInfo = table.getColumnInfo( icol ); Class colClass = colInfo.getContentClass(); if ( ! Number.class.isAssignableFrom( colClass ) ) { throw new IllegalArgumentException( "Column not numeric" ); } // Iterate over rows accumulating the total. double sum = 0.0; RowSequence rseq = table.getRowSequence(); while ( rseq.hasNext() ) { rseq.next(); Number value = (Number) rseq.getCell( icol ); sum += value.doubleValue(); } rseq.close(); return sum; }The next example prints out every cell value. Since it needs all the values in each cell, it uses
getRow
:
void writeTable( StarTable table ) throws IOException { int nCol = table.getColumnCount(); RowSequence rseq = table.getRowSequence(); while ( rseq.hasNext() ) { rseq.next(); Object[] row = rseq.getRow(); for ( int icol = 0; icol < nCol; icol++ ) { System.out.print( row[ icol ] + "\t" ); } System.out.println(); } rseq.close(); }In this case a tidier representation of the values might be given by replacing the
print
call with:
System.out.print( table.getColumnInfo( icol ) .formatValue( row[ icol ], 20 ) + "\t" );
If a table's
isRandom
method returns true, then it is possible to access the cells of a
table in any order. This is done using the
getCell
or getRow
methods directly on the table itself (not on a RowSequence
).
Similar comments about
whether to use getCell
or getRow
apply as
in the previous section.
If an attempt is made to call these random access methods
on a non-random table
(one for which isRandom()
returns false
),
an
UnsupportedOperationException
will be thrown.
What do you do if you have a sequential-only table and you
need to do random access on it?
The Tables.randomTable
utility method
takes any table and returns one which is guaranteed to provide random access.
If the original one is random, it just returns it unchanged,
otherwise it returns a table which contains the same data as the
submitted one, but for which isRandom
is guaranteed to
return true.
It effectively does this by taking out a RowSequence
and
reading all the data sequentially into some kind of (memory- or disk-based)
data structure which can provide random access, returning a new
StarTable object based on that data structure.
Clearly, this might be an expensive process. For this reason if you have an application in which random access will be required at various points, it is usually a good idea to ensure you have a random-access table at the application's top level, and for general-purpose utility methods to require random-access tables (throwing an exception if they get a sequential-only one). The alternative practice of utility methods converting argument tables to random-access when they are called might result in this expensive process happening multiple times.
Exactly what kind of random-access structure
Tables.randomTable
uses to store the data
(the two basic alternatives are in memory or on disk) is determined
by the default
uk.ac.starlink.table.StoragePolicy
;
this can be set from outside the program by defining the
startable.storage
system property, as explained in
Appendix A.4.
If you know you want to store the data in a particular way however,
you can use one of the predefined storage policy instances.
Here is how you can randomise a table by stashing its data in a
temporary disk file:
table = StoragePolicy.PREFER_DISK.randomTable( table );You might want to do this if you know that its data size is likely to exceed available RAM. Note however that writing temporary disk files may be disabled in certain security contexts (e.g. running as an applet or an unsigned WebStart application).
Table input handlers use StoragePolicy
objects too
if they need to cache bulk data from a table. These are passed
to their
makeStarTable
directly or by a StarTableFactory
If you don't set a policy directly on the factory, this too will use
the default one.
The table input and output facilities of STIL are handled
by format-specific input and output handlers;
supplied with the package are, amongst others, a VOTable input
handler and output handler, and this means that STIL can read
and write tables in VOTable format. An input handler is
an object which can turn an external resource into a
StarTable
object, and an output handler is one
which can take a StarTable
object and store it
externally in some way.
These handlers are independent components of the system, and
so new ones can be written, allowing all the STIL features
to be used on new table formats without having to make any changes
to the core classes of the library.
There are two ways of using these handlers.
You can either use them directly to read in/write out a table
using a particular format, or you can use the generic I/O facilities
which know about several of these handlers and select an appropriate
one at run time. The generic reader class is
StarTableFactory
which knows about input handlers implementing the
TableBuilder
interface,
and the generic writer class is
StarTableOutput
which knows about output handlers implementing the
StarTableWriter
interface.
The generic approach is more flexible in a multi-format
environment (your program will work whether you point it at a
VOTable, FITS file or SQL query) and is generally easier to use,
but if you know what format you're going to be dealing with
you may have more control over format-specific options using the
handler directly.
The following sections describe in more detail the generic input and output facilities, followed by descriptions of each of the format-specific I/O handlers which are supplied with the package. There is an additional section (Section 3.6) which deals with table reading and writing using an SQL database.
STIL can deal with externally-stored tables in a number of different
formats. It does this using a set of handlers each of which knows
about turning an external table into a java
StarTable
object
or turning a StarTable
object into an external table.
Such an "external table" will typically be a file on a local disk,
but might also be a URL pointing to a file on a remote host,
or an SQL query on a remote database, or something else.
The core I/O framework of STIL itself does not know about any table formats, but it knows how to talk to format-specific input or output handlers. A number of these (VOTable, FITS, ASCII and others, described in the following subsections) are supplied as part of the STIL package, so for dealing with tables in these formats you don't need to do any extra work. However, the fact that these are treated in a standard way means that it is possible to add new format-specific handlers and the rest of the library will work with tables in that format just the same as with the supplied formats.
If you have a table format which is unsupported by STIL as it stands, you can do one or both of the following:
TableBuilder
interface to take a stream of data and return a
StarTable
object.
Install it in a
StarTableFactory
,
either programmatically using the
getDefaultBuilders
or
getKnownBuilders
methods,
or by setting the
startable.readers
system property.
This factory will then be able to pick up tables in this format
as well as other known formats.
Such a TableBuilder
can also be used directly to read tables by code which knows
that it's dealing with data in that particular format.
StarTableWriter
interface to take a StarTable
and write it to a given
destination.
Install it in a
StarTableOutput
either programmatically using the
setHandlers
method or by setting the
startable.writers
system property.
This StarTableOutput
will be then be able to write tables in this format as well as others.
Such a StarTableWriter
can also be used directly to write tables by code which wants to write
data in that particular format.
Because setting the
startable.readers
/startable.writers
system properties can be done by the user at runtime, an application
using STIL can be reconfigured to work with new table formats without
having to rebuild either STIL or the application in question.
This document does not currently offer a tutorial on writing new table I/O handlers; refer to the javadocs for the relevant classes.
Obtaining a table from a generic-format external source is done using a
StarTableFactory
.
The job of this class is to keep track of which input handlers are
registered and to use one of them to read data from an input stream and
turn it into a StarTable
.
The basic rule is that you use one of the StarTableFactory
's
makeStarTable
methods to turn what you've got
(e.g. String, URL, DataSource
)
into a StarTable, and away you go.
If no StarTable can be created (for instance because the file named doesn't
exist, or because it is not in any of the supported formats)
then some sort of
IOException
/TableFormatException
will be thrown.
Note that if the target stream is compressed in one of the supported
formats (gzip, bzip2, Unix compress) it will be
uncompressed automatically
(the work for this is done by the
DataSource
class).
There are two distinct modes in which StarTableFactory
can work: automatic format detection and named format.
In automatic format detection mode, the type of data contained in
an input stream is determined by looking at it. What actually happens
is that the factory hands the stream to each of the handlers in its
default handler list
in turn, and the first one that recognises the format (usually based on
looking at the first few bytes) attempts to make a table from it.
The filename is not used in any way to guess the format.
This works well for formats such as FITS and VOTable that can be
easily recognised, but less well for text-based formats such as
comma-separated values.
You can access and modify the default handler list using the
getDefaultBuilders
method.
In this mode, you only need to specify the table location, like this:
public StarTable loadTable( File file ) throws IOException { return new StarTableFactory().makeStarTable( file.toString() ); }Currently, only FITS and VOTable formats are auto-detected.
In named format mode, you have to specify the name of the format as
well as the table location. This can be solicited from the user if it's
not known at build time; the known format names can be got from the
getKnownFormats
method.
The list of format handlers
that can be used in this way can be accessed or
modified using the
getKnownBuilders
method; it usually contains all
the ones in the default handler list, but doesn't have to.
Table construction in named format mode might look like this:
public StarTable loadFitsTable( File file ) throws IOException { return new StarTableFactory().makeStarTable( file.toString(), "fits" ); }
The javadocs detail variations on these calls.
If you need to influence how bulk data might be cached, you can set
the factory's StoragePolicy
(see Section 2.3.3).
If you want to ensure that the table
you get provides random access (see Section 2.3),
you should do something like this:
public StarTable loadRandomTable( File file ) throws IOException { StarTableFactory factory = new StarTableFactory(); factory.setWantRandom( true ); StarTable table = factory.makeStarTable( file.toString() ); return Tables.randomTable( table ); }Setting the
wantRandom
flag on the factory gives handlers
an opportunity to return different StarTable
implementations
which are efficient for different patterns of access, but it is only
a hint, so the Tables.randomTable
method must be called to
ensure random access (see Section 2.3.3).
Generic serialization of tables to external storage is done using a
StarTableOutput
object.
This has a similar job to the StarTableFactory
described in the previous section;
it mediates between code which wants to output a table and a
set of format-specific output handler objects.
The writeStarTable
method
is used to write out a StarTable
object.
When invoking this method, you specify the location to which
you want to output the table (usually, but not necessarily, a filename)
and a string specifying the format you would like to write in.
This is usually a short string like "fits" associated with one of
the registered output handlers - a list of known formats can be
got using the
getKnownFormats
method.
Use is straightforward:
void writeTableAsFITS( StarTable table, File file ) throws IOException { new StarTableOutput().writeStarTable( table, file.toString(), "fits" ); }If, as in this example, you know what format you want to write the table in, you could equally use the relevant
StarTableWriter
object directly
(in this case a FitsTableWriter
).
The table input handlers supplied with STIL are listed in
this section, along with notes on any peculiarities they have in
turning a string into a StarTable
.
By default, any of these can be used in a StarTableFactory
's
named format mode, but only some of them in automatic format detection mode.
In most cases the string supplied to name the table that
StarTableFactory
should read
is a filename or a URL, referencing a plain or compressed copy of
the stream from which the file is available.
In some cases an additional specifier can be given after a '#'
character to give additional information about where in that stream
the table is located.
This table summarises which input handlers are available, what format strings they use, and whether they are tried in automatic format detection mode.
Format string Format Read Automatic Format Mode? ------------- ----------- ---------------------- FITS-plus FITS with VOTable metadata in HDU#0 yes FITS FITS BINTABLE or ASCII extension yes VOTable VOTable 1.0/1.1 yes ASCII Whitespace-separated ASCII yes CSV Comma-separated values no WDC World Data Centre format no
The FitsTableBuilder
class
can read FITS binary (BINTABLE) and ASCII (TABLE) table extensions.
Unless told otherwise, the first table extension in the named FITS
file will be used. If the name supplied to the StarTableFactory
ends in a # sign followed by a number however, it means that the
requested table is in the indicated extension of a multi-extension FITS file.
Hence 'spec23.fits#3' refers to the 3rd extension (4th HDU) in the
file spec23.fits. The suffix '#0' is never used in this context for
a legal FITS file, since the primary HDU cannot contain a table.
If the table is stored in a FITS binary table extension in a file
on local disk in uncompressed form, then the file will be mapped
rather than read when the StarTable
is constructed,
which means that constructing the StarTable is very fast.
If you want to inhibit this behaviour, you can refer to the file as a URL,
for instance using the designation 'file:spec23.fits' rather
than 'spec23.fits';
this fools the handler into thinking that the file cannot be mapped.
Currently, binary tables are read rather more efficiently than ASCII ones.
The FitsPlusTableBuilder
handler also reads a variant of the FITS format - see Section 3.5.2.
The VOTableBuilder
class
reads VOTables; it should handle any table
which conforms to the VOTable 1.0 or 1.1 specifications (as well as quite
a few which violate them).
In particular, it can deal with
a DATA element whose content is encoded using any of the formats
TABLEDATA, BINARY or FITS.
While VOTable documents often contain a single table, the format
describes a hierarchical structure which can contain zero or more
TABLE elements.
By default, the StarTableFactory
will find the first one in the document for you, which in the
(common) case that the document contains only one table is just
what you want.
If you're after one of the others, identify it with a zero-based
index number after a '#' sign at the end of the table designation.
So if the following document is called 'cats.xml':
<VOTABLE> <RESOURCE> <TABLE name="Star Catalogue"> ... </TABLE> <TABLE name="Galaxy Catalogue"> ... </TABLE> </RESOURCE> </VOTABLE>then 'cats.xml' or 'cats.xml#0' refers to the "Star Catalogue" and 'cats.xml#1' refers to the "Galaxy Catalogue".
Much more detailed information about the VOTable I/O facilities, which can be used independently of the generic I/O described in this section, is given in Section 6.
In many cases tables are stored in some sort of unstructured plain
text format, with cells separated by spaces or some other delimiters.
The AsciiTableBuilder
class attempts to read these and interpret what's there
in sensible ways, but since there are so
many possibilities of different delimiters and formats for exactly
how values are specified, it won't always succeed.
Here are the rules for how the ASCII-format table handler reads tables:
Boolean
,
Short
Integer
,
Long
,
Float
,
Double
,
String
If the list of rules above looks frightening, don't worry, in many cases it ought to make sense of a table without you having to read the small print. Here is an example of a suitable ASCII-format table:
# # Here is a list of some animals. # # RECNO SPECIES NAME LEGS HEIGHT/m 1 pig "Pigling Bland" 4 0.8 2 cow Daisy 4 2 3 goldfish Dobbin "" 0.05 4 ant "" 6 0.001 5 ant "" 6 0.001 6 ant '' 6 0.001 7 "queen ant" 'Ma\'am' 6 2e-3 8 human "Mark" 2 1.8In this case it will identify the following columns:
Name Type ---- ---- RECNO Integer SPECIES String NAME String LEGS Integer HEIGHT/m FloatIt will also use the text "
Here is a list of some animals
"
as the Description parameter of the table.
Without any of the comment lines, it would still interpret the table,
but the columns would be given the names col1
..col5
.
If you understand the format of your files but they don't exactly
match the criteria above, the best thing is probably to write a
simple free-standing program or script which will convert them
into the format described here.
You may find Perl, awk or sed suitable languages for this sort of thing.
Alternatively, you could write a new input handler as explained in
Section 3.1 - you may find it easiest to subclass
the uk.ac.starlink.table.formats.StreamStarTable
class
in this case.
The CsvTableBuilder
handler can read data in the semi-standard CSV format.
The intention is that it understands the version of that format
spoken by MS Excel amongst others,
though the documentation on which it is based
was not obtained directly from Microsoft.
The rules for data which it understands are as follows:
Some support is provided for files produced by the World Data Centre for Solar Terrestrial Physics. The format itself apparently has no name, but files in this format look something like the following:
Column formats and units - (Fixed format columns which are single space seperated.) ------------------------ Datetime (YYYY mm dd HHMMSS) %4d %2d %2d %6d - %1s aa index - 3-HOURLY (Provisional) %3d nT 2000 01 01 000000 67 2000 01 01 030000 32 ...
The handler class
WDCTableBuilder
is experimental;
it was reverse-engineered from looking at a couple of
data files in the target format, and may not be very robust.
The table output handlers supplied with STIL are listed in this
section, along with any peculiarities they have in writing a
StarTable
to a destination given by a string
(usually a filename).
As described in Section 3.3, a StarTableOutput
will under normal circumstances permit output of a table in any
of these formats. Which format is used is determined by the
"format" string passed to
StarTableOutput.writeStarTable
as indicated in the following table; if a null format string
is supplied, the name of the destination string may be used
to select a format (e.g. a destination ending ".fits" will, unless
otherwise specified, result in writing FITS format).
Format string Format written Associated file extension ------------- -------------- ------------------------- jdbc SQL database fits FITS binary table .fits, .fit, .fts votable-tabledata TABLEDATA-format VOTable .xml, .vot votable-binary-inline Inline BINARY-format VOTable votable-binary-href External BINARY-format VOTable votable-fits-inline Inline FITS-format VOTable votable-fits-href External FITS-format VOTable text Human-readable plain text .txt ascii Machine-readable text csv Comma-separated value .csv html Standalone HTML document .html, .htm html-element HTML TABLE element latex LaTeX tabular environment .tex latex-document LaTeX freestanding document mirage Mirage input formatMore detail on all these formats is given in the following sections.
In some cases, more control can be exercised over the exact
output format by using the format-specific table writers themselves
(these are listed in the following sections),
since they may offer additional configuration methods.
The only advantage of using a StarTableOutput
to mediate
between them is to make it easy to switch between output formats,
especially if this is being done by the user at runtime.
The FITS handler,
FitsTableWriter
,
will output a two-HDU FITS file; the first
(primary) HDU has no interesting content, and the second one
(the first extension) is of type BINTABLE.
To write the FITS header for the table extension, certain things
need to be known which may not be available from the StarTable
object being written; in particular the number of rows and the
size of any variable-sized arrays (including variable-length strings)
in the table. This may necessitate two passes through the data to
do the write.
StarTableOutput
will write in FITS format if a
format string "fits" is used, or the format string is null and
the destination string ends in ".fits".
A variant form of FITS file is written by the
FitsPlusTableWriter
handler.
This is the same as the basic form,
except that the primary HDU (HDU#0)
contains a 1-d array of characters which form the text of a DATA-less
VOTable. The FITS table in the first extension (HDU#1) is understood to
contain the data. The point of this is that the VOTable can contain
all the rich metadata about the table, but the bulk data are in a form
which can be read efficiently. Crucially, the resulting FITS file
is a perfectly good FITS table on its own, so non-VOTable-aware readers
can read it in just the usual way, though of course they do not
benefit from the additional metadata stored in the VOTable header.
While the normal VOTable/FITS encoding has some of these advantages, it is inconvenient in that either (for in-line data) the FITS file is base64-encoded and so hard to read efficiently, in particular for random access or (for referenced data) the table is split across two files.
The VOTable handler,
VOTableWriter
,
can write VOTables in a variety of flavours
(see Section 6.1). In all cases, a StarTableOutput
will write a well-formed VOTable document with a single RESOURCE element
holding a single TABLE element. The different output formats
(TABLEDATA/FITS/BINARY, inline/href) are determined by configuration
options on the handler instance. The default handler writes to
inline TABLEDATA format.
The href-type formats write a (short) XML file and a FITS or binary file with a similar name into the same directory, holding the metadata and bulk data respectively. The reference from the one to the other is a relative URL, so if one is moved, they both should be.
For more control over writing VOTables, consult Section 6.3.
The AsciiTableWriter
class writes to a simple text format which is intended to be machine readable
(and fairly human readable as well).
It can be read in by the ASCII input handler, and is described in more
detail in Section 3.4.3.
The CsvTableWriter
class writes to the semi-standard CSV format, including an initial line
containing column names.
It can be read by the CSV input handler, and is described in more
detail in Section 3.4.4.
The TextTableWriter
class writes to a simple text-based format which is designed to
be read by humans. According to configuration, this may or may not
output table parameters as name:value pairs at before the table data
themselves.
Here is an example of a short table written in this format:
+-------+----------+--------+------+--------+--------+ | index | Species | Name | Legs | Height | Mammal | +-------+----------+--------+------+--------+--------+ | 1 | pig | Bland | 4 | 0.8 | true | | 2 | cow | Daisy | 4 | 2.0 | true | | 3 | goldfish | Dobbin | 0 | 0.05 | false | | 4 | ant | | 6 | 0.0010 | false | | 5 | ant | | 6 | 0.0010 | false | | 6 | human | Mark | 2 | 1.9 | true | +-------+----------+--------+------+--------+--------+
The HTMLTableWriter
class writes tables as HTML 3.2
TABLE elements.
According to configuration this may be a freestanding HTML document
or the TABLE element on its own (suitable for incorporation into larger
HTML documents).
The LatexTableWriter
class writes tables as LaTeX tabular
environments,
either on their own or wrapped in a LaTeX document.
For obvious reasons, this isn't too suitable for tables with very
many columns.
Mirage
is a powerful standalone tool developed at Bell Labs for
interactive analysis
of multidimensional data. It uses its own file format for input.
The MirageTableWriter
class can write tables in this format.
With appropriate configuration, STIL can read and write
tables from a relational database such as
MySQL.
You can obtain a StarTable
which is the result
of a given SQL query on a database table,
or store a StarTable
as a new table in an existing
database.
Note that this does not allow you to work on the database 'live'.
The classes that control these operations mostly live in the
uk.ac.starlink.table.jdbc
package.
If a username and/or password is required for use of the table,
and this is not specified in the query URL,
StarTableFactory
will arrange to prompt for it.
By default this prompt is to standard output (expecting a response
on standard input), but some other mechanism, for instance a
graphical one, can be used by modifying the factory's
JDBCHandler.
For more information on GUI-friendly use of SQL databases,
see Section 4.3.
Java/STIL does not come with the facility to use any particular
SQL database "out of the box"; some additional configuration must
be done before it can work.
This is standard JDBC practice,
as explained in the documentation of the
java.sql.DriverManager
class.
In short, what you need to do is define the
"jdbc.drivers
" system property
to include the name(s) of the JDBC driver(s) which you wish
to use. For instance to enable use of MySQL with the Connector/J
database you might start up java with a command line like this:
java -classpath /my/jars/mysql-connector-java-3.0.8-stable-bin.jar:myapp.jar -Djdbc.drivers=com.mysql.jdbc.Driver my.path.MyApplicationOne gotcha to note is that an invocation like this will not work if you are using '
java -jar
' to invoke your application;
if the -jar
flag is used then any class path set on
the command line or in the CLASSPATH environment variable or elsewhere
is completely ignored. This is a consequence of Java's security model.
For both the reader and the writer described below, the string
passed to specify the database query/table may or may not require
additional authentication before the read/write can be carried out.
The general rule is that an attempt will be made to connect with
the database without asking the user for authentication,
but if this fails the user
will be queried for username and password, following which a second
attempt will be made. If username/password has already been
solicited, this will be used on subsequent connection attempts.
How the user is queried (e.g. whether it's done graphically or
on the command line) is controlled by the
JDBCHandler
's
JDBCAuthenticator
object,
which can be set by application code if required.
If generic I/O is being used, you can use the
get/setJDBCHandler
methods of the
StarTableFactory
or
StarTableOutput
being used.
To the author's knowledge, STIL has so far been used with the following RDBMSs and drivers:
You can view the result of an SQL query on a relational database as
a table. This can be done either by passing the query string directly to
a JDBCHandler
or
by passing it to the generic
StarTableFactory.makeStarTable
method (any string starting 'jdbc:' in the latter case is assumed to
be an SQL query string).
The form of this query string is as follows:
jdbc:<driver-specific-url>#<sql-query>The exact form is dependent on the JDBC driver which is installed. Here is an example for MySQL:
jdbc:mysql://localhost/astro1?user=mbt#SELECT ra, dec FROM swaa WHERE vmag<18If the username and/or password are required for the query but are not specified in the query string, they will be prompted for.
Note that the StarTable does not represent the JDBC table itself,
but a query on table. You can get a StarTable representing the
whole JDBC table with a query like SELECT * from table-name
,
but this may be expensive for large tables.
You can write out a StarTable
as a new table in
an SQL-compatible RDBMS. Note this will require appropriate access
privileges and may overwrite any existing table of the same name.
The general form of the string which specifies the destination of
the table being written is:
jdbc:<driver-specific-url>#<new-table-name>
Here is an example for MySQL with Connector/J:
jdbc:mysql://localhost/astro1?user=mbt#newtabwhich would write a new table called "newtab" in the MySQL database "astro1" on the local host with the access privileges of user mbt.
STIL provides a number of facilities to make life easier if you
are writing table-aware applications with a graphical user interface.
Most of these live in the
uk.ac.starlink.table.gui
package.
From a user's point of view dragging is done by clicking down a mouse button on some visual component (the "drag source") and moving the mouse until it is over a second component (the "drop target") at which point the button is released. The semantics of this are defined by the application, but it usually signals that the dragged object (in this case a table) has been moved or copied from the drag source to the drop target; it's an intuitive and user-friendly way to offer transfer of an object from one place (application window) to another. STIL's generic I/O classes provide methods to make drag and drop of tables very straightforward.
Dragging and dropping are handled separately but in either
case, you will need to construct a new
javax.swing.TransferHandler
object (subclassing TransferHandler
itself and overriding
some methods as below) and install it on the Swing
JComponent
which is to
do be the drag source/drop target using its
setTransferHandler
method.
To allow a Swing component to accept tables that are dropped onto it,
implement TransferHandler
's
canImport
and
importData
methods like this:
class TableDragTransferHandler extends TransferHandler { StarTableFactory factory = new StarTableFactory(); public boolean canImport( JComponent comp, DataFlavor[] flavors ) { return factory.canImport( flavors ); } public boolean importData( JComponent comp, Transferable dropped ) { try { StarTable table = factory.makeStarTable( dropped ); processDroppedTable( table ); return true; } catch ( IOException e ) { e.printStackTrace(); return false; } } }Then any time a table is dropped on that window, your
processDroppedTable
method will be called on it.
To allow tables to be dragged off of a component, implement the
createTransferable
method like this:
class TableDropTransferHandler extends TransferHandler { StarTableOutput writer = new StarTableOutput(); protected Transferable createTransferable( JComponent comp ) { StarTable table = getMyTable(); return writer.transferStarTable( table ); } }(you may want to override
getSourceActions
and
getVisualRepresentation
as well.
For some Swing components
(see the
Swing Data Transfer documentation for a list),
this is all that is required.
For others, you will need to arrange to recognise the drag gesture
and trigger the TransferHandler
's
exportAsDrag
method as well;
you can use a DragListener
for this or see its source code for an example of how to do it.
Because of the way that Swing's Drag and Drop facilities work, this is not restricted to transferring tables between windows in the same application; if you incorporate one or other of these capabilities into your application, it will be able to exchange tables with any other application that does the same, even if it's running in a different Java Virtual Machine or on a different host - it just needs to have windows open on the same display device. TOPCAT is an example; you can drag tables off of or onto the Table List in the Control Window.
Some graphical components exist to make it easier to load or save tables.
They are effectively table-friendly alternatives to using a
JFileChooser
.
StarTableChooser
JFileChooser
,
but it handles turning selected items into a StarTable
for you.
A format selector is available - this can either be set to "auto"
for automatic format detection mode, or to one of the named available
formats.
StarTableNodeChooser
StarTableSaver
StarTableOutput
knows about.
As explained in Section 3.6,
tables can be read from and written to SQL databases using the
JDBC framework. Since quite a lot of information has to be
specified to indicate the details of the table source/destination
(driver name, server host, database name, table name, user authentication
information...) in most cases this requires rather user-unfriendly
URLs to be entered.
For graphical applications, special dialogue components are supplied
which makes this much easier for the user. These contain
one input field per piece of information, so that the user does not need
to remember or understand the JDBC-driver-specific URL.
There are two of these components:
SQLReadDialog
for reading
tables and
SQLWriteDialog
for writing them.
The uk.ac.starlink.table
package provides
many generic facilities for table processing.
The most straightforward one to use is the
RowListStarTable
, described in the
next subsection, which gives you a
StarTable
whose data are stored in memory, so
you can set and get cells or rows somewhat like a tabular version of an
ArrayList
.
For more flexible and efficient table processing, you may want to look at the later subsections below, which make use of "pull-model" processing.
If all you want to do is to read tables in or write them out however, you may not need to read the information in this section at all.
If you want to store tabular data in memory,
possibly to output it using STIL's output facilities,
the easiest way to do it is to use a
RowListStarTable
object.
You construct it with information about the kind of value which
will be in each column, and then populate it with data by adding rows.
Normal read/write access is provided via a number of methods,
so you can insert and delete rows, set and get table cells,
and so on.
The following code creates and populates a table containing some information about some astronomical objects:
// Set up information about the columns. ColumnInfo[] colInfos = new ColumnInfo[ 3 ]; colInfos[ 0 ] = new ColumnInfo( "Name", String.class, "Object name" ); colInfos[ 1 ] = new ColumnInfo( "RA", Double.class, "Right Ascension" ); colInfos[ 2 ] = new ColumnInfo( "Dec", Double.class, "Declination" ); // Construct a new, empty table with these columns. RowListStarTable astro = new RowListStarTable( colInfos ); // Populate the rows of the table with actual data. astro.addRow( new Object[] { "Owl nebula", new Double( 168.63 ), new Double( 55.03 ) } ); astro.addRow( new Object[] { "Whirlpool galaxy", new Double( 202.43 ), new Double( 47.22 ) } ); astro.addRow( new Object[] { "M108", new Double( 167.83 ), new Double( 55.68 ) } );
The RowListStarTable
described in the previous section
is adequate for many table processing purposes, but since it controls
how storage is done (in a List
of rows) it imposes a
number of restrictions - an obvious one is that all the data have
to fit in memory at once.
A number of other classes are provided for more flexible table handling,
which make heavy use of the "pull-model" of processing, in which
the work of turning one table to another is not done at the time
such a transformation is specified, but only when the transformed table data
are actually required, for instance to write out to disk as a new
table file or to display in a GUI component such as a JTable
.
One big advantage of this is that calculations which are never used
never need to be done. Another is that in many cases it means you
can process large tables without having to allocate large amounts of
memory. For multi-step processes, it is also often faster.
The central idea to get used to is that of a "wrapper" table.
This is a table which wraps itself round another one (its "base" table),
using calls to the base table to provide the basic data/metadata
but making some some modifications before it returns it to the caller.
Tables can be wrapped around each other many layers deep like an onion.
This is rather like the way that
java.io.FilterInputStream
s work.
Although they don't have to, most wrapper table classes inherit
from WrapperStarTable
.
This is a no-op wrapper, which simply delegates all its calls to
the base table.
Its subclasses generally leave most of the methods alone, but
override those which relate to the behaviour they want to change.
Here is an example of a very simple wrapper table, which simply
capitalizes its base table's name:
class CapitalizeStarTable extends WrapperStarTable { public CapitalizeStarTable( StarTable baseTable ) { super( baseTable ); } public String getName() { return getBaseTable().getName().toUpperCase(); } }As you can see, this has a constructor which passes the base table to the
WrapperStarTable
constructor itself, which takes the base table as an argument.
Wrapper tables which do any meaningful wrapping will have
a constructor which takes a table, though they may take additional
arguments as well.
More often it is the data part which is modified and the metadata which is
left the same - some examples of this are given in Section 5.4.
Some wrapper tables wrap more than one table,
for instance joining two base tables
to produce a third one which draws data and/or metadata from both
(e.g. ConcatStarTable
,
JoinStarTable
).
The idea of wrappers is used on some components other than
StarTable
s themselves: there are
WrapperRowSequence
s and
WrapperColumn
s as well.
These can be useful in implementing wrapper tables.
Working with wrappers can often be more efficient than,
for instance, doing a calculation
which goes through all the rows of a table calculating new values
and storing them in a RowListStarTable
.
If you familiarise yourself with the set of wrapper tables supplied
by STIL, hopefully you will often find there are ones there which
you can use or adapt to do much of the work for you.
Here is a list of some of the wrapper classes provided, with brief descriptions:
ColumnPermutedStarTable
RowPermutedStarTable
RowSubsetStarTable
RandomWrapperStarTable
ProgressBarStarTable
JProgressBar
,
so the user can monitor progress in processing a table.
ProgressLineStarTable
ProgressBarStarTable
, but controls an animated
line of text on the terminal for command-line applications.
JoinStarTable
ConcatStarTable
This section gives a few examples of how STIL's wrapper classes can be used or adapted to perform useful table processing. If you follow what's going on here, you should be able to write table processing classes which fit in well with the existing STIL infrastructure.
This example shows how you can wrap a table to provide a sorted
view of it. It subclasses
RowPermutedStarTable
,
which is a wrapper that presents its base table with the rows in a
different order.
class SortedStarTable extends RowPermutedStarTable { // Constructs a new table from a base table, sorted on a given column. SortedStarTable( StarTable baseTable, int sortCol ) throws IOException { // Call the superclass constructor - this will throw an exception // if baseTable does not have random access. super( baseTable ); assert baseTable.isRandom(); // Check that the column we are being asked to sort on has // a defined sort order. Class clazz = baseTable.getColumnInfo( sortCol ).getContentClass(); if ( ! Comparable.class.isAssignableFrom( clazz ) ) { throw new IllegalArgumentException( clazz + " not Comparable" ); } // Fill an array with objects which contain both the index of each // row, and the object in the selected column in that row. int nrow = (int) getRowCount(); RowKey[] keys = new RowKey[ nrow ]; for ( int irow = 0; irow < nrow; irow++ ) { Object value = baseTable.getCell( irow, sortCol ); keys[ irow ] = new RowKey( (Comparable) value, irow ); } // Sort the array on the values of the objects in the column; // the row indices will get sorted into the right order too. Arrays.sort( keys ); // Read out the values of the row indices into a permutation array. long[] rowMap = new long[ nrow ]; for ( int irow = 0; irow < nrow; irow++ ) { rowMap[ irow ] = keys[ irow ].index_; } // Finally set the row permutation map of this table to the one // we have just worked out. setRowMap( rowMap ); } // Defines a class (just a structure really) which can hold // a row index and a value (from our selected column). class RowKey implements Comparable { Comparable value_; int index_; RowKey( Comparable value, int index ) { value_ = value; index_ = index; } public int compareTo( Object o ) { RowKey other = (RowKey) o; return this.value_.compareTo( other.value_ ); } } }
Suppose you have three arrays representing a set of points on the plane,
giving an index number and an x and y coordinate, and you would like
to manipulate them as a StarTable. One way is to use the
ColumnStarTable
class,
which gives you a table of a specified number of rows but initially
no columns, to which you can add data a column at a time.
Each added column is an instance of
ColumnData
;
the ArrayColumn
class
provides a convenient implementation which wraps an array of objects
or primitives (one element per row).
StarTable makeTable( int[] index, double[] x, double[] y ) { int nRow = index.length; ColumnStarTable table = ColumnStarTable.makeTableWithRows( nRow ); table.addColumn( ArrayColumn.makeColumn( "Index", index ) ); table.addColumn( ArrayColumn.makeColumn( "x", x ) ); table.addColumn( ArrayColumn.makeColumn( "y", y ) ); return table; }
A more general way to approach this is to write a new implementation of
StarTable
;
this is like what happens in Swing if you write your own
TableModel
to provide data for a
JTable
.
In order to do this you will usually want to subclass one of the existing
implementations, probably
AbstractStarTable
,
RandomStarTable
or
WrapperStarTable
.
Here is how it can be done:
class PointsStarTable extends RandomStarTable { // Define the metadata object for each of the columns. ColumnInfo[] colInfos_ = new ColumnInfo[] { new ColumnInfo( "Index", Integer.class, "point index" ), new ColumnInfo( "X", Double.class, "x co-ordinate" ), new ColumnInfo( "Y", Double.class, "y co-ordinate" ), }; // Member variables are arrays holding the actual data. int[] index_; double[] x_; double[] y_; long nRow_; public PointsStarTable( int[] index, double[] x, double[] y ) { index_ = index; x_ = x; y_ = y; nRow_ = (long) index_.length; } public int getColumnCount() { return 3; } public long getRowCount() { return nRow_; } public ColumnInfo getColumnInfo( int icol ) { return colInfos_[ icol ]; } public Object getCell( long lrow, int icol ) { int irow = checkedLongToInt( lrow ); switch ( icol ) { case 0: return new Integer( index_[ irow ] ); case 1: return new Double( x_[ irow ] ); case 2: return new Double( y_[ irow ] ); default: throw new IllegalArgumentException(); } } }In this case it is only necessary to implement the
getCell
method;
RandomStarTable
implements the other data access methods
(getRow
, getRowSequence
) in terms of this.
In this example we will append to a table a new column in which each cell contains the sum of all the other numeric cells in that row.
First, we define a wrapper table class which contains only a single
column, the one which we want to add.
We subclass AbstractStarTable
,
implementing its abstract methods as well as the getCell
method which may be required if the base table is random-access.
class SumColumnStarTable extends AbstractStarTable { StarTable baseTable_; ColumnInfo colInfo0_ = new ColumnInfo( "Sum", Double.class, "Sum of other columns" ); // Constructs a new summation table from a base table. SumColumnStarTable( StarTable baseTable ) { baseTable_ = baseTable; } // Has a single column. public int getColumnCount() { return 1; } // The single column is the sum of the other columns. public ColumnInfo getColumnInfo( int icol ) { if ( icol != 0 ) throw new IllegalArgumentException(); return colInfo0_; } // Has the same number of rows as the base table. public long getRowCount() { return baseTable_.getRowCount(); } // Provides random access iff the base table does. public boolean isRandom() { return baseTable_.isRandom(); } // Get the row from the base table, and sum elements to produce value. public Object getCell( long irow, int icol ) throws IOException { if ( icol != 0 ) throw new IllegalArgumentException(); return calculateSum( baseTable_.getRow( irow ) ); } // Use a WrapperRowSequence based on the base table's RowSequence. // Wrapping a RowSequence is quite like wrapping the table itself; // we just need to override the methods which require new behaviour. public RowSequence getRowSequence() throws IOException { final RowSequence baseSeq = baseTable_.getRowSequence(); return new WrapperRowSequence( baseSeq ) { public Object getCell( int icol ) throws IOException { if ( icol != 0 ) throw new IllegalArgumentException(); return calculateSum( baseSeq.getRow() ); } public Object[] getRow() throws IOException { return new Object[] { getCell( 0 ) }; } }; } // This method does the arithmetic work, summing all the numeric // columns in a row (array of cell value objects) and returning // a Double. Double calculateSum( Object[] row ) { double sum = 0.0; for ( int icol = 0; icol < row.length; icol++ ) { Object value = row[ icol ]; if ( value instanceof Number ) { sum += ((Number) value).doubleValue(); } } return new Double( sum ); } }We could use this class on its own if we just wanted a 1-column table containing summed values. The following snippet however combines an instance of this class with the table that it is summing from, resulting in an n+1 column table in which the last column is the sum of the others:
StarTable getCombinedTable( StarTable inTable ) { StarTable[] tableSet = new StarTable[ 2 ]; tableSet[ 0 ] = inTable; tableSet[ 1 ] = new SumColumnStarTable( inTable ); StarTable combinedTable = new JoinStarTable( tableSet ); return combinedTable; }
Some fairly sophisticated classes for performing table joins
(by matching values of columns between tables) are
available in the uk.ac.starlink.table.join
package.
These work and are used in
TOPCAT,
but are not described further in this document,
and they are subject to changes in future releases.
Read the javadocs for the uk.ac.starlink.table.join
package,
or watch this space, or contact the author if you are keen to use this
functionality.
VOTable is an XML-based format for storage and transmission of tabular data, endorsed by the International Virtual Observatory Alliance, who make available the schema (http://www.ivoa.net/xml/VOTable/v1.1) and documentation (http://www.ivoa.net/Documents/latest/VOT.html). The current version of STIL provides full support for versions 1.0 and 1.1 of the format.
As with the other handlers tabular data can be read from and written to VOTable documents using the generic facilities described in Section 3. However if you know you're going to be dealing with VOTables the VOTable-specific parts of the library can be used on their own; this may be more convenient and it also allows access to some features specific to VOTables.
The VOTable functionality is provided in the package
uk.ac.starlink.votable
.
It has the following features:
The actual table data (the cell contents, as opposed to metadata such as column names and characteristics) in a VOTable are stored in a TABLE's DATA element. The VOTable standard allows it to be stored in a number of ways; It may be present as XML elements in a TABLEDATA element, or as binary data in one of two formats, BINARY or FITS; if binary the data may either be available externally from a given URL or present in a STREAM element encoded as character data using the Base64 scheme (Base64 is defined in RFC2045).
To summarise, the possible formats are:
<!-- TABLEDATA format, inline --> <DATA> <TABLEDATA> <TR> <TD>1.0</TD> <TD>first</TD> </TR> <TR> <TD>2.0</TD> <TD>second</TD> </TR> <TR> <TD>3.0</TD> <TD>third</TD> </TR> </TABLEDATA> </DATA> <!-- BINARY format, inline --> <DATA> <BINARY> <STREAM encoding='base64'> P4AAAAAAAAVmaXJzdEAAAAAAAAAGc2Vjb25kQEAAAAAAAAV0aGlyZA== </STREAM> </BINARY> </DATA> <!-- BINARY format, to external file --> <DATA> <BINARY> <STREAM href="file:/home/mbt/BINARY.data"/> </BINARY> </DATA>External files may also be compressed using gzip. The FITS ones look pretty much like the binary ones, though in the case of an externally referenced FITS file, the file in the URL is a fully functioning FITS file with (at least) one BINTABLE extension.
In the case of FITS data the VOTable standard leaves it up to the application how to resolve differences between metadata in the FITS stream and in the VOTable which references it. For a legal VOTable document STIL behaves as if it uses the metadata from the VOTable and ignores any in FITS headers, but if they are inconsistent to the extent that the FIELD elements and FITS headers describe different kinds of data, results may be unpredictable.
At the time of writing, most VOTables in the wild are written in TABLEDATA format. This has the advantage that it is human-readable, and it's easy to write and read using standard XML tools. However, it is not a very suitable format for large tables because of the high overheads of processing time and storage/bandwidth, especially for numerical data. For efficient transport of large tables therefore, one of the binary formats is recommended.
STIL can read and write VOTables in any of these formats. In the case of reading, you just need to point the library at a document or TABLE element and it will work out what format the table data are stored in and decode them accordingly - the user doesn't need to know whether it's TABLEDATA or external gzipped FITS or whatever. In the case of writing, you can choose which format is used.
STIL offers a number of options for reading a VOTable document,
described below. In all cases they provide you with a way of
obtaining the table data (contents of the cells) without having to
know how these were encoded. The API defines the contents of a cell
only as an Object
; the actual type will be determined by
the datatype
and arraysize
attributes of the
column.
In general, scalars are represented by the corresponding primitive
wrapper class (Integer
, Double
, Boolean
etc), and arrays are represented by an array of primitives
of the corresponding type (int[]
, double[]
,
boolean[]
).
Arrays are only ever one-dimensional - information about any
multidimensional shape they may have is supplied separately
(use the getShape
method on the corresponding
ColumnInfo
).
There are a couple of exceptions to this: arrays with
datatype="char"
or "unicodeChar"
are represented by String objects
since that is almost always what is intended
(n-dimensional arrays of char
are treated as if they were
(n-1)-dimensional arrays of Strings),
and unsignedByte
types are represented as if they were
short
s, since in Java bytes are always signed.
Complex values are represented as if they were an array of the corresponding
type but with an extra dimension of size two (the most rapidly varying).
The following table summarises how all VOTable datatypes are represented:
datatype Class for scalar Class for arraysize>1 -------- ---------------- --------------------- boolean Boolean boolean[] bit boolean[] boolean[] unsignedByte Short short[] short Short short[] int Integer int[] long Long long[] char Char String or String[] unicodeChar Char String or String[] float Float float[] double Double double[] floatComplex float[] float[] doubleComplex double[] double[]
The application program does not, however,
have to investigate the values of the
datatype
and arraysize
attributes to work
out what kinds of objects it is going to find as values of cells
in a table. Each column of the table object that STIL gives you
can report the class of object which will be found in it.
In most cases, you will receive a
StarTable
object which contains
the table metadata. To find the class of objects in the fourth column,
you can do this:
Class clazz = starTable.getColumnInfo(3).getContentClass();Every value obtained from a cell in that column can be cast to the class
clazz
(though note such a value might be
null
).
Useful tip: for generic processing it is often handy to cast
numeric scalar cell contents to type
Number
.
The simplest way to read a VOTable is to use
the generic table reading method described
in Section 3.2, in which you just submit the URL or
filename of a document to a
StarTableFactory
,
and get back a StarTable
object.
If you're after one of several TABLE elements in a document,
you can specify this by giving its number as the URL's
fragment ID (the bit after the '#' sign).
The following code would give you StarTable
s
read from the first and fourth TABLE elements in the file "tabledoc.xml":
StarTableFactory factory = new StarTableFactory(); StarTable tableA = factory.makeStarTable( "tabledoc.xml", "votable" ); StarTable tableB = factory.makeStarTable( "tabledoc.xml#3", "votable" );
All the data and metadata from the TABLE in the
VOTable document are available from the resulting StarTable
object,
as table parameters, ColumnInfo
s
or the data themselves.
If you are just trying to extract the data and metadata from a
single TABLE element somewhere in a VOTable document, this
procedure is probably all you need.
The parameters
of the table which is obtained are taken from PARAM and INFO elements.
PARAMs or INFOs in the TABLE element itself or in its RESOURCE parent
are considererd to apply to the table.
The values of these can be obtained using the
getParameters
method.
If you are interested in the structure of the VOTable document
in more detail than the table items that can be extracted from it,
you will need to obtain a hierarchical structure based on the XML.
The standard way of doing this for any XML document in Java is to
obtain a DOM (Document Object Model) based on the XML -
this is an API defined by the
W3C
representing a tree-like structure of elements and attributes which can be
navigated by using methods like
getFirstChild
and
getParentNode
.
STIL provides you with a DOM which can be viewed exactly like a standard one (it implements the DOM API) but has some special features.
VOElement
class
(which itself implements the DOM
Element
interface).
This provides a few convenience methods such as
getChildrenByName
which can be useful but don't
do anything that you couldn't do with the Element
interface
alone.
VOElement
which provide
methods specific to their rĂ´le in a VOTable document.
For instance every GROUP element in the tree is represented
by a GroupElement
;
this class has a method
getFields
which returns all the FIELD elements associated with that group
(this method examines its FIELDref children and locates their
FIELD elements elsewhere in the DOM).
The various specific element types are not considered in detail here -
see the javadocs for the subclasses of
VOElement
.
TableElement
.
A TableElement
can provide the table data
stored within it; to access these data you don't need to know whether
it is stored in TABLEDATA, FITS or BINARY form etc.
getElementById
method.
To acquire this DOM you will use a
VOElementFactory
,
usually feeding a File
, URL
or
InputStream
to one of its makeVOElement
methods.
The bulk data-less DOM mentioned above is possible because
the VOElementFactory
processes the XML document using SAX,
building a DOM as it goes along, but when
it gets to the bulk data-bearing elements it interprets their data
on the fly and stores it in a
form which can be accessed efficiently later
rather than inserting the elements into the DOM.
SAX (Simple API for XML, defined in the
org.xml.sax
package)
is an event driven processing model which, unlike DOM,
does not imply memory usage that scales with the size of the document.
In this way any but the weirdest VOTable documents
can be turned into a DOM of very modest size.
This means you can have all the benefits of a DOM
(full access to the hierarchical structure) without the disadvantages
usually associated with DOM-based VOTable processing
(potentially huge memory footprint).
Of course in order to be accessed later,
the data extracted from a stream of TR elements or from
the inline content of a STREAM element has to get stored somewhere.
Where it gets put is determined by the
StoragePolicy
in effect for the VOElementFactory
.
You would normally set this to
PREFER_MEMORY
or PREFER_DISK
to cache the data in memory or a temporary file, but you can set it
to DISCARD
if you're only interested in the document structure and don't want
the data at all.
If for some reason you want to work with a full DOM containing
the TABLEDATA or STREAM children, you can parse the document to
produce a DOM Document
or Element
as usual (e.g. using a
DocumentBuilder
)
and feed that to one of the the
VOElementFactory
's
makeVOElement
methods instead.
Having obtained your DOM,
the easiest way to access the data of a TABLE element is to locate
the relevant TableElement
in the tree
and turn it into a StarTable
using the VOStarTable
adapter class.
You can interrogate the resulting object for its data and metadata as
described in Section 2.
This StarTable
may or may not provide random access
(isRandom
may or may not return true), according to how the data were obtained.
If it's a binary stream from a remote URL it may only be possible to
read rows from start to finish a row at a time, but if it was in
TABLEDATA form it will be possible to access cells in any order.
If you need random access for a table and you don't have it
(or don't know if you do) then use the methods described in
Section 2.3.3.
It is possible to access the table data directly (without
making it into a StarTable
) by using the
getData
method of the TableElement
, but in this case you need
to work a bit harder to extract some of the data and metadata in
useful forms. See the TabularData
documentation for details.
One point to note about VOElementFactory
's parsing is that
it is not normally restricted to elements named in the VOTable
standard, so a document which does not conform to the standard can
still be processed as a VOTable if parts of it contain VOTable-like
structures.
Here is an example of using this approach to read the structure of a, possibly complex, VOTable document. This program locates the third TABLE child of the first RESOURCE element and prints out its column titles and table data.
void printThirdTable( File votFile ) throws IOException, SAXException { // Create a tree of VOElements from the given XML file. VOElement top = new VOElementFactory().makeVOElement( votFile ); // Find the first RESOURCE element using standard DOM methods. NodeList resources = top.getElementsByTagName( "RESOURCE" ); Element resource = (Element) resources.item( 0 ); // Locate the third TABLE child of this resource using one of the // VOElement convenience methods. VOElement vResource = (VOElement) resource; VOElement[] tables = vResource.getChildrenByName( "TABLE" ); TableElement tableEl = (TableElement) tables[ 2 ]; // Turn it into a StarTable so we can access its data. StarTable starTable = new VOStarTable( tableEl ); // Write out the column name for each of its columns. int nCol = starTable.getColumnCount(); for ( int iCol = 0; iCol < nCol; iCol++ ) { String colName = starTable.getColumnInfo( iCol ).getName(); System.out.print( colName + "\t" ); } System.out.println(); // Iterate through its data rows, printing out each element. for ( RowSequence rSeq = starTable.getRowSequence(); rSeq.hasNext(); ) { rSeq.next(); Object[] row = rSeq.getRow(); for ( int iCol = 0; iCol < nCol; iCol++ ) { System.out.print( row[ iCol ] + "\t" ); } System.out.println(); } }
Versions of STIL prior to V2.0 worked somewhat differently to this - they produced a tree structure representing the VOTable document which resembled, but wasn't, a DOM (it didn't implement the W3C DOM API). The current approach is more powerful and in some cases less fiddly to use.
If you only need one-shot access to the data in a single TABLE element,
you can use instead the
streamStarTable
method of VOTableBuilder
, which effectively turns a
stream of bytes containing a VOTable document into a stream of
events representing a table's metadata and data. You define how these events
are processed by writing an implementation of the
TableSink
interface.
The data are obtained using SAX parsing and not stored,
so it should be fast and have a very small memory footprint.
Since it bails out as soon as it has transmitted the table it's after,
it may even be able to pull table data out of a stream which is not valid XML.
The following code streams a table and prints out the name of the first column and the average of its values (assumed numerical):
// Set up a class to handle table processing callback events. class ColumnReader implements TableSink { private long count_; // number of rows so far private double sum_; // running total of values from first column double average_; // first column average String title_; // first column name // Handle metadata by printing out the first column name. public void acceptMetadata( StarTable meta ) { title_ = meta.getColumnInfo( 0 ).getName(); } // Handle a row by updating running totals. public void acceptRow( Object[] row ) { sum_ += ((Number) row[ 0 ]).doubleValue(); count_++; } // At end-of-table event calculate the average. public void endRows() { average_ = sum_ / count_; } }; // Streams the named file to the sink we have defined, getting the data // from the first TABLE element in the file. public void summarizeFirstColumn( URL votLocation ) throws IOException { ColumnReader reader = new ColumnReader(); InputStream in = votLocation.openStream(); new VOTableBuilder().streamStarTable( in, reader, "0" ); in.close(); System.out.println( "Column name: " + reader.title_ ); System.out.println( "Column average: " + reader.average_ ); }
Parameters are obtained in streamed access from PARAM and INFO elements in the same way as described in Section 6.2.1.
To write a VOTable using STIL you have to prepare a
StarTable
object which defines the output table's metadata and data.
The uk.ac.starlink.table
package
provides a rich set of facilities for creating and modifying these,
as described in Section 5
(see Section 5.4.2 for an example of how to
turn a set of arrays into a StarTable
).
In general the FIELD arraysize
and datatype
attributes are determined
from column classes using the same mappings described in
Section 6.2.
A range of facilities for writing StarTable
s out as
VOTables is offered, allowing control over the data format and the structure
of the resulting document.
Depending on your application, you may wish to provide the option of output to tables in a range of different formats including VOTable. This can be easily done using the generic output facilities described in Section 3.3.
The simplest way to output a table in VOTable format is to
use a VOTableWriter
,
which will output a VOTable document with the simplest structure capable
of holding a TABLE element, namely:
<VOTABLE version='1.0'> <RESOURCE> <TABLE> <!-- .. FIELD elements here --> <DATA> <!-- table data here --> </DATA> </TABLE> </RESOURCE> </VOTABLE>The writer can be configured/constructed to write its output in any of the formats described in Section 6.1 (TABLEDATA, inline FITS etc) by using its
setDataFormat
and
setInline
methods.
In the case of streamed output which is not inline,
the streamed (BINARY or FITS) data will be written to a
with a name similar to that of the main XML output file.
Assuming that you have your StarTable
ready to output,
here is how you could write it out in two of the possible formats:
void outputAllFormats( StarTable table ) throws IOException { // Obtain a writer for inline TABLEDATA output. VOTableWriter voWriter = new VOTableWriter( DataFormat.TABLEDATA, true ); // Use it to write the table to a named file. voWriter.writeStarTable( table, "tabledata-inline.xml" ); // Modify the writer's characteristics to use it for referenced FITS output. voWriter.setDataFormat( DataFormat.FITS ); voWriter.setInline( false ); // Use it to write the table to a different named file. // The writer will choose a name like "fits-href-data.fits" for the // actual FITS file referenced in the XML. voWriter.writeStarTable( table, "fits-href.xml" ); }
You may wish for more flexibility, such as the possibility
to write a VOTable document with a more complicated
structure than a simple VOTABLE/RESOURCE/TABLE one,
or to have more control over the output destination for referenced STREAM data.
In this case you can use the
VOSerializer
class which handles only the output of TABLE elements themselves
(the hard part), leaving you free to embed these in whatever XML
superstructure you wish.
Once you have obtained your VOSerializer
by specifying
the table it will serialize and the data format it will use,
you should invoke its
writeFields
method followed by
either
writeInlineDataElement
or
writeHrefDataElement
.
For inline output, the output should be sent to the same stream
to which the XML itself is written. In the latter case however,
you can decide where the streamed data go, allowing possibilities such
as sending them to a separate file in a location of your choosing,
creating a new MIME attachment to a message, or sending it down
a separate channel to a client. In this case you will need to
ensure that the href associated with it (written into the STREAM element's
href
attribute) will direct a reader to the right place.
Here is an example of how you could write two inline tables in TABLEDATA format in the same RESOURCE element:
void writeTables( StarTable t1, StarTable t2 ) throws IOException { BufferedWriter out = new BufferedWriter( new OutputStreamWriter( System.out ) ); out.write( "<VOTABLE version='1.1'>\n" ); out.write( "<RESOURCE>\n" ); out.write( "<DESCRIPTION>Two tables</DESCRIPTION>\n" ); out.write( "<TABLE>\n" ); VOSerializer ser1 = VOSerializer.makeSerializer( DataFormat.TABLEDATA, t1 ); ser1.writeFields( out ); ser1.writeInlineDataElement( out ); out.write( "</TABLE>\n" ); out.write( "<TABLE>\n" ); VOSerializer ser2 = VOSerializer.makeSerializer( DataFormat.TABLEDATA, t2 ); ser2.writeFields( out ); ser2.writeInlineDataElement( out ); out.write( "</TABLE>\n" ); out.write( "</RESOURCE>\n" ); out.write( "</VOTABLE>\n" ); }and here is how you could write a table with its data streamed to a binary file with a given name (rather than the automatically chosen one selected by
VOTableWriter
):
void writeTable( StarTable table, File binaryFile ) throws IOException { BufferedWriter out = new BufferedWriter( new OutputStreamWriter( System.out ) ); out.write( "<VOTABLE version='1.1'>\n" ); out.write( "<RESOURCE>\n" ); out.write( "<TABLE>\n" ); VOSerializer ser = VOSerializer.makeSerializer( DataFormat.BINARY, table ); ser.writeFields( out ); DataOutputStream binOut = new DataOutputStream( new FileOutputStream( binaryFile ) ); ser.writeHrefDataElement( out, "file:" + binaryFile, binOut ); binOut.close(); out.write( "</TABLE>\n" ); out.write( "<RESOURCE>\n" ); out.write( "<VOTABLE>\n" ); }
My thanks are due to a number of people who have contributed help to me in writing this document and the STIL software, including:
STIL is written in Java by Sun Microsysystems Inc. and contains code from the following non-Starlink libraries:
This section contains a list of system properties which influence the behaviour of STIL. You don't have to set any of these; the relevant components will use reasonable defaults if they are undefined. Note that in certain security contexts it may not be possible to access system properties; in this case STIL will silently ignore any such settings.
The jdbc.drivers
property is a standard JDBC property
which names JDBC driver classes that can be used to talk to SQL databases.
See Section 3.6.1 for more details.
java.io.tmpdir
is a standard Java system property which
is used by the "disk
" storage policy.
It determines where temporary files stored by the
PREFER_DISK
("disk
")
storage policy will be written (see Appendix A.4).
The default value is typically "/tmp
" on Unix-like platforms.
The startable.readers
property provides additional
input handlers which StarTableFactory
can use for loading tables in named format mode.
Its value is a colon-separated list of class names, where each
element of the list must be the name of a class on the classpath
which implements
the TableBuilder
interface
and has a no-arg constructor.
When a new StarTableFactory
is constructed, an instance of each such named class is created and
added to its known handler list. Users of the library can therefore
read tables in the format that the new handler understands by giving
its format name when doing the load.
The startable.storage
property sets the initial
value of the default storage policy, which influences where bulk
table data will be cached.
The recognised values are:
memory
: table data will normally be stored in memory
(PREFER_MEMORY
)disk
: table data will normally be stored in
temporary disk files
(PREFER_DISK
)discard
: table data will normally be thrown away,
leaving only metadata
(DISCARD
)memory
".
You may also give the name of a subclass of
StoragePolicy
which has a no-arg
constructor, in which case an instance of this class will be used
as the default policy.
See Section 2.3.3 and the
StoragePolicy
javadocs for
more discussion of storage policies.
The startable.writers
property provides additional
output handlers which StarTableOutput
can use for writing tables.
Its value is a colon-separated list of class names, where each
element of the list must be the name of a class on the classpath
which implements the
StarTableWriter
interface and
has a no-arg constructor.
When a new StarTableOutput
is constructed, an instance of each such named class is created
and added to its handler list. Users of the library can therefore
write tables in the format that the new handler knows how to
write to by giving its format name when performing the write.
A couple of applications using the STIL library currently exist, as listed below. More will be made available in the future, either bundled with STIL or in separate application packages.
Tablecopy copies a table from any of the (input-) supported formats into any of the (output-) supported ones. This is pretty trivial, since all the hard work is done using the generic I/O facilities described in Section 3.
The application is the main
method of
TableCopy
, though it might get
moved in future releases.
Invoking it with the "-help
" flag will print a usage message.
Assuming STIL is on your classpath:
% java uk.ac.starlink.table.TableCopy -help Usage: TableCopy [-disk] [-debug] [-h[elp]] [-v[erbose]] [-ifmt <in-format>] [-ofmt <out-format>] <in-table> <out-table> Auto-detected in-formats: fits-plus fits votable Known in-formats: fits-plus fits votable ascii csv wdc Known out-formats: jdbc fits fits-plus fits-basic votable-tabledata votable-binary-inline votable-fits-href votable-binary-href votable-fits-inline text ascii csv html html-element latex latex-document mirage
The flags and arguments have the following meaning:
According to how you have downloaded STIL you may alternatively
be able to invoke it using the "tablecopy
" script.
Here are some examples of use:
tablecopy stars.fits stars.xml
tablecopy -ofmt text http://remote.host/data/vizier.xml.gz#4 -
tablecopy -disk -ifmt csv -ofmt ascii spec.csv.gz -
java -Djdbc.drivers=com.mysql.jdbc.Driver -classpath stil.jar:mysql-connector-java-3.0.6-stable-bin.jar uk.ac.starlink.table.TableCopy -ofmt fits "jdbc:mysql://localhost/astro1#SELECT ra, dec, Imag, Kmag FROM dqc" wfslist.fit
TOPCAT (Tool for OPerations on Catalogues And Tables) is a graphical application for interactive manipulation of tables, written by the same author as STIL. All its table I/O and processing is built on STIL.
STIL is released under the terms of the GNU General Public License. It has been developed and tested under Sun's Java J2SE1.4.1, but is believed to run under other 1.4 or 1.5/5.0 versions of the J2SE. It is not currently source-compatible with Java 5.0 however (because of the expansion of the DOM interface to cover DOM level 3), so if you wish to compile it yourself you will need a 1.4 version of the JDK.
An attempt is made to keep backwardly-incompatible changes to the public API of this library to a minimum. However, rewrites and improvements may to lead to API-level incompatibilities in some cases, as described below. The author would be happy to advise people who have used previous versions and want help adapting their code to the current STIL release.
RowListStarTable
.JoinStarTable
can now
deduplicate column names.ConcatStarTable
permits adding the rows of one table after the rows of another.RowSequence
interface
has been modified; a new
close
method has been introduced, and the old
advance()
and getRowIndex()
methods have been
withdrawn (these latter were not very useful and in some cases
problematic to implement).
setName()
and setURL()
have been added to the StarTable
interface.
StoragePolicy
class was
introduced, which allows you to influence whether cached table data are
stored in memory or on disk.
This has led to backwardly-incompatible changes to public interfaces and
classes:
makeStarTable
now takes a new StoragePolicy
argument, and
VOElementFactory
's methods
are now instance methods rather than static ones.
StarTableFactory
class's
makeStarTable
methods now come in two flavours - with and
without a format name. This corresponds to two table reading modes:
named format mode and automatic format detection mode.
In named format mode you specify the format of the table you are
trying to read and in automatic format detection mode you rely
on the factory to work it out
using magic numbers. Although automatic detection works well
for VOTable and FITS, it's poor for text-based formats like
ASCII and CSV.
This has resulted in addition of some new two-argument
makeStarTable
methods and the withdrawal of the
getBuilders
method in favour of two new methods
getDefaultBuilders
and
getKnownBuilders
(similarly for setter methods) which deal with the handlers
used in automatic detection mode and the ones available for
named format mode respectively.
Note that the ASCII table format is not automatically detected,
so to use ASCII tables you now have to specify the format explicitly.
VOElement
class
has been rewritten and now implements the DOM
Element
interface.
This means that the hierarchical structure which you can navigate
to obtain information about the VOTable document and extract
table data actually is a DOM rather than just being sat
on top of one. You can therefore now use it just as a normal
DOM tree (making use of the methods defined in the
org.w3c.dom
interface,
interoperating with third-party components which require a DOM).
This has had a number of additional benefits and consequences:
VOElementFactory
class now has instance methods rather than static methods.
StoragePolicy.DISCARD
into a VOElementFactory
it is now possible to obtain a data-less
(structure only, hence minimal resource) VOTable DOM.
TableSink
methods
now throw exceptions.
StarTableFactory
's list for automatic format detection,
but CSV-format tables can be loaded using named format mode.
The format is intended to match the (widely-used) variety used by
Microsoft Excel amongst others (with optional column names).
basic-fits
").
StoragePolicy
).
Short
/Float
types in preference to Integer
/Double
if the input data make this appropriate.
d
or D
as an exponent letter
as well as e
or E
.
uk.ac.starlink.table.join
.
These work and have full javadocs, but
they are still experimental, subject to substantial change in future
releases, and not documented properly in this document.
NULL_VALUE_INFO
)
Nulls in FITS and VOTable/FITS tables are now preserved correctly
on output.
TCOMMx
headers.
EmptyStarTable
added.