Starlink User Note
256
Mark Taylor
29 April 2005
$Id: sun256.xml,v 1.3 2005/04/29 15:57:27 mbt Exp $
STILTS is a set of command-line tools for processing tabular data. It has been designed for, but is not restricted to, use on astronomical data such as source catalogues. It contains both generic (format-independent) table processing tools and tools for processing VOTable documents. Facilities offered include format conversion, format validation, column manipulation, row selection, sorting, statistical calculations and metadata display. Calculations on cell data can be performed using a powerful and extensible expression language.
The package is written in pure Java and based on STIL, the Starlink Tables Infrastructure Library. This gives it high portability, support for many data formats (including FITS, VOTable, text-based formats and SQL databases), extensibility and scalability. Where possible the tools are written to accept streamed data so the size of tables which can be processed is not limited by available memory.
STILTS is available under the GNU General Public Licence. This document describes the initial, beta, release of the package.
STILTS provides a number of command-line applications which can be used for manipulating tabular data. Conceptually it sits between, and uses many of the same classes as, the packages STIL, which is a set of Java APIs providing table-related functionality, and TOPCAT, which is a graphical application providing the user with an interactive platform for exploring one or more tables. This document is mostly self-contained - it covers some of the same ground as the STIL and TOPCAT user documents (SUN/252 and SUN/253 respectively).
Currently, this package consists of four commands:
tcopy
:
Table Format Converter
tpipe
:
Generic Table Pipeline Utility
votcopy
:
VOTable Encoding Translator
votlint
:
VOTable Validity Checker
tpipe
.
There are many ways you might want to use these tools; here are a few possibilities:
This release of STILTS is beta software. While all the functions are believed to be working, the interface (command line arguments etc) may undergo changes in the future. All user comments are most welcome.
There are two ways of invoking the commands in this package.
If you're using a Unix-like operating system and have downloaded
the package in a form which includes the shell scripts,
the easiest way is to use the scripts tpipe
,
votlint
etc. These will either be in the same location
as the stilts.jar
file or in starjava/bin
The form of invocation in this case is:
<command-name> <java-args> <application-args>(the
java
should be on your path).
A simple example would be:
votcopy -f binary t1.xml t2.xmlFor convenience you can mix up java-args and application-args - the script will untangle them and reorder them properly. If you use the
-classpath
argument or have a CLASSPATH environment variable set,
then classpath elements thus specified will be added to the classpath
required to run the command.
The examples in the
command descriptions below use this form for convenience.
If you don't have a Unix-like shell available however, or if you don't have the STILTS shell scripts, you will need to invoke Java directly with the appropriate classes on your classpath. The general form of an invocation command in this case will depend on your system, but will probably look like:
java <java-args> -classpath <command-classpath> <command-classname> <application-args>The example above in this case would look something like:
java -classpath some/where/stilts.jar uk.ac.starlink.table.VotCopy -f binary t1.xml t2.xml
More detail is given on the parts of these command lines in the following subsections.
The classpath is the list of places that Java looks to find
the bits of compiled code that it uses to run an application.
Depending on how you have done your installation the core STILTS
classes could be in various places, but they are probably in a
file with one of the names
stilts.jar
,
topcat.jar
,
topcat-lite.jar
or
topcat-full.jar
.
The full pathname of one of these files can therefore be used as
your classpath. In some cases these files are self-contained and
in some cases they reference other jar files in the filesystem -
this means that they may or may not continue to work if you
move them from their original location.
Under certain circumstances the tools might need additional classes, for instance:
In most cases it is not necessary to specify any additional arguments to the Java runtime, but it can be useful in certain circumstances. The two main kinds of options you might want to specify directly to Java are these:
-Dname=value
.
So for instance to ensure that temporary files are written to
the /home/scratch
directory, you could use the flag
-Djava.io.tmpdir=/home/scratch
OutOfMemoryError
then this has
proved too small for the job in hand. You can increase the
heap memory with the -Xmx
flag. To set the heap
memory size to 256 megabytes, use the flag
-Xmx256M(don't forget the 'M' for megabyte). You will probably find performance is poor if you specify a heap size larger than the physical memory of the machine you're running on.
Note however that encouraging STILTS to use disk files
rather than memory for temporary storage is often a
better idea than boosting the heap memory -
this is done by specifying the -disk
flag on most of the
tools, or possibly -Dstartable.storage=disk
.
You can specify other options to Java such as tuning and profiling flags etc, but if you want to do that sort of thing you probably don't need me to tell you about it.
Each command in the package is defined at the top level with a
single class in the namespace uk.ac.starlink.ttools
.
For instance tpipe
is defined by the class
uk.ac.starlink.ttools.TablePipe
,
and if you don't have the tpipe
script you can
run it using
java -classpath some/where/stilts.jar uk.ac.starlink.ttools.TablePipeThe classname of each command is listed with its description in Appendix A.
The arguments for each application are listed with the command descriptions elsewhere in this document, but some of them are common to most or all of the commands:
-h[elp]
-v[erbose]
-disk
OutOfMemoryError
s.
This flag is in most cases equivalent to specifying the system
property -Dstartable.storage=disk
.
-debug
System properties are a way of getting information into the
Java runtime - they are a bit like environment variables.
There are two ways to set them when using STILTS: either
on the command line using arguments of the form
-Dname=value
(see Section 2.2)
or in a file in your home directory called
.starjava.properties
, in the form of a
name=value
line.
Thus submitting the flag
-Dvotable.strict=trueon the command line is equivalent to having the following in your
.starjava.properties
file:
# Force strict interpretation of the VOTable standard. votable.strict=true
The following system properties have special significance to STILTS:
java.io.tmpdir
-disk
flag has been
specified (see Section 2.4).
jdbc.drivers
jel.classes
startable.readers
startable.storage
disk
" has basically the same effect as
supplying the "-disk
" argument on the command line
(see Section 2.4).
startable.writers
votable.strict
true
for strict enforcement of the VOTable standard
when parsing VOTables. This prevents the parser from working round
certain common errors, such as missing arraysize
attributes on FIELD
or PARAM
elements with datatype="char"
.
False by default.
This section describes additional configuration which must be done to allow the commands to access SQL-compatible relational databases for reading or writing tables. If you don't need to talk to SQL-type databases, you can ignore the rest of this section. The steps described here are the standard ones for configuring JDBC (which sort-of stands for Java Database Connectivity), described in more detail on Sun's JDBC web page.
To use STILTS with SQL-compatible databases you must:
jdbc.drivers
system property to the name of the
driver class as described in Section 2.5
These steps are all standard for use of the JDBC system. See SUN/252 for information about JDBC drivers known to work with STIL.
Here is an example of using tcopy to write the results of an SQL query on a table in a MySQL database as a VOTable:
tcopy -classpath /usr/local/jars/mysql-connector-java.jar \ -Djdbc.drivers=com.mysql.jdbc.Driver \ "jdbc:mysql://localhost/db1#SELECT id, ra, dec FROM gsc WHERE mag < 9" \ -ofmt votable gsc.votor invoking Java directly:
java -classpath stilts.jar:/usr/local/jars/mysql-connect-java.jar \ -Djdbc.drivers=com.mysql.jdbc.Driver \ uk.ac.starlink.ttools.TableCopy \ "jdbc:mysql://localhost/db1#SELECT id, ra, dec FROM gsc WHERE mag < 9" \ -ofmt votable gsc.votIn the latter case you have to exercise some care to get the arguments in the right order (see Section 2).
Alternatively, you can set some of this up beforehand to make the invocation easier. If you set your CLASSPATH environment variable to include the driver jar file (and the STILTS classes if you're invoking Java directly rather than using the scripts), and if you put the line
jdbc.drivers=com.mysql.jdbc.Driverin the
.starjava.properties
file in your home directory,
then you could avoid having to give the -classpath
and
-Djdbc.drivers
flags respectively.
The generic commands in STILTS
(currently tpipe
and
tcopy
)
have no native format for table storage, they can process
data in a number of formats equally well.
STIL has its own model of what a table
consists of, which is basically:
The formats the package knows about are dependent on the input and output handlers currently installed. The ones installed by default are listed in the following subsections. More may be added in the future, and it is possible to install new ones at runtime - see the STIL documentation for details.
Some of the tools in this package ask you to specify the format
of input tables using the -f
or -ifmt
flag.
The following list gives the values usually allowed for this
(matching is case-insensitive):
fits
votable
TABLE
element is used,
but this can be altered
by supplying the 0-based index after a '#
' sign,
so "table.xml#4" means the fifth TABLE
element in the document.
ascii
csv
wdc
In some cases (when using VOTable or FITS format tables) the tools can detect the table format automatically, and no explicit specification is necessary. If this isn't the case and you omit the format specification, the tool will fail with a suitable error message. It is always safe to specify the format explicitly, and may lead to more helpful error messages in the case that the table can't be read correctly.
Some of the tools ask you to specify the format of output tables
using the -ofmt
flag.
The following list gives the values usually allowed for this;
in some cases as you can see there are several variants of a given format.
You can abbreviate these names, and the first match in the list below
will be used, so for instance specifying votable
is equivalent
to specifying votable-tabledata
and fits
is equivalent to fits-plus
.
Matching is case-insensitive.
fits-plus
fits-basic
votable-tabledata
votable-binary-inline
STREAM
elementvotable-binary-href
votable-fits-href
votable-fits-inline
STREAM
elementascii
text
csv
html
TABLE
elementhtml-element
TABLE
elementlatex
tabular
environmentlatex-document
tabular
environmentmirage
In some cases the tools may guess what output format you want by looking at the extension of the output filename you have specified.
The tpipe
command allows you to use algebraic
expressions when making row selections and defining new synthetic
columns. In both cases you are defining an expression which
has a value in each row as a function of the values in the existing
columns in that row.
This is a powerful feature which permits you to manipulate and select
table data in very flexible ways.
The syntax for entering these expressions is explained in this section.
What you write are actually expressions in the Java language, which are compiled into Java bytecode before evaluation. However, this does not mean that you need to be a Java programmer to write them. The syntax is pretty similar to C, but even if you've never programmed in C most simple things, and many complicated ones, are quite intutitive.
The following explanation gives some guidance and examples for writing these expressions. Unfortunately a complete tutorial on writing Java is beyond the scope of this document, but it should provide enough information for even a novice to write useful expressions.
The expressions that you can write are basically any function
of all the column values which apply
to a given row; the function result can then define
the per-row value of a new column (tpipe -addcol
)
or a selection flag (tpipe -select
).
If the built-in operators and functions are not sufficient,
or it's unwieldy to express your function in one line of code,
it is possible to add new functions by writing your own classes -
see Section 4.6.3.
Note: if Java is running in an environment with certain security restrictions (a security manager which does not permit creation of custom class loaders) then algebraic expressions won't work at all. It's not particularly likely that security restrictions will be in place if you are running from the command line though.
To create a useful expression which can be evaluated for each row in a table, you will have to refer to cells in different columns of that row. You can do this in two ways:
There is a special column whose name is "Index" and whose ID is "$0". The value of this is the same as the row number (the first row is 1).
The value of the variables so referenced will be a primitive
(boolean, byte, short, char, int, long, float, double) if the
column contains one of the corresponding types. Otherwise it will
be an Object of the type held by the column, for instance a String.
In practice this means: you can write the name of a column, and it will
evaluate to the numeric (or string) value that that column contains
in each row. You can then use this in normal algebraic expressions
such as "B_MAG - U_MAG
" as you'd expect.
When no special steps are taken, if a null value (blank cell) is encountered in evaluating an expression (usually because one of the columns it relies on has a null value in the row in question) then the result of the expression is also null.
It is possible to exercise more control than this, but it
requires a little bit of care,
because the expressions work in terms of primitive values
(numeric or boolean ones) which don't in general have a defined null
value. The name "null
"
in expressions gives you the java null
reference, but this cannot be matched against a primitive value
or used as the return value of a primitive expression.
For most purposes, the following two tips should enable you to work with null values:
NULL_
"
(use upper case) to the column name or $ID. This
will yield a boolean value which is true if the column contains
a blank, and false otherwise.
NULL
"
(upper case). To return a null value from a non-numeric expression
(e.g. a String column) use the name "null
" (lower case).
Null values are often used in conjunction with the conditional
operator, "? :
"; the expression
test ? tval : fvalreturns the value
tval
if the boolean expression test
evaluates true, or fval
if test
evaluates false.
So for instance the following expression:
Vmag == -99 ? NULL : Vmagcan be used to define a new column which has the same value as the
Vmag
column for most values, but if Vmag
has the "magic" value -99 the new column will contain a blank.
The opposite trick (substituting a blank value with a magic one) can
be done like this:
NULL_Vmag ? -99 : VmagSome more examples are given in Section 4.5.
The operators are pretty much the same as in the C language. The common ones are:
+
(add)
-
(subtract)
*
(multiply)
/
(divide)
%
(modulus)
!
(not)
&&
(and)
||
(or)
^
(exclusive-or)
==
(numeric identity)
!=
(numeric non-identity)
<
(less than)
>
(greater than)
<=
(less than or equal)
>=
(greater than or equal)
(byte)
(numeric -> signed byte)
(short)
(numeric -> 2-byte integer)
(int)
(numeric -> 4-byte integer)
(long)
(numeric -> 8-byte integer)
(float)
(numeric -> 4-type floating point)
(double)
(numeric -> 8-byte floating point)
+
(string concatenation)
[]
(array dereferencing)
?:
(conditional switch)
instanceof
(class membership)
Many functions are available for use within your expressions, covering standard mathematical and trigonometric functions, arithmetic utility functions, type conversions, and some more specialised astronomical ones, as well as providing actions to take when a point is activated. You can use them in just the way you'd expect, by using the function name (unlike column names, this is case-sensitive) followed by comma-separated arguments in brackets, so
max(IMAG,JMAG)will give you the larger of the values in the columns IMAG and JMAG, and so on.
The functions available for use by default are listed by class in the following subsections with their arguments and short descriptions.
String manipulation and query functions.
concat( s1, s2 )
s1+s2
, but blank values can sometimes appear as
the string "null
" if you do it like that.
s1
(String): first strings2
(String): second stringconcat( s1, s2, s3 )
s1+s2+s3
, but blank values can sometimes appear as
the string "null
" if you do it like that.
s1
(String): first strings2
(String): second strings3
(String): third stringconcat( s1, s2, s3, s4 )
s1+s2+s3+s4
,
but blank values can sometimes appear as
the string "null
" if you do it like that.
s1
(String): first strings2
(String): second strings3
(String): third strings4
(String): fourth stringequals( s1, s2 )
s1==s2
,
which can (for technical reasons) return false even if the
strings are the same.
s1
(String): first strings2
(String): second stringequalsIgnoreCase( s1, s2 )
s1
(String): first strings2
(String): second stringstartsWith( whole, start )
whole
(String): the string to teststart
(String): the sequence that may appear at the start of
whole
endsWith( whole, end )
whole
(String): the string to testend
(String): the sequence that may appear at the end of
whole
contains( whole, sub )
whole
(String): the string to testsub
(String): the sequence that may appear within whole
length( str )
str
(String): stringmatches( str, regex )
str
(String): string to testregex
(String): regular expression stringmatchGroup( str, regex )
str
(String): string to match againstregex
(String): regular expression containing a grouped sectionreplaceFirst( str, regex, replacement )
str
(String): string to manipulateregex
(String): regular expression to match in str
replacement
(String): replacement stringreplaceAll( str, regex, replacement )
str
(String): string to manipulateregex
(String): regular expression to match in str
replacement
(String): replacement stringsubstring( str, startIndex )
str
(String): the input stringstartIndex
(integer): the beginning index, inclusivesubstring( str, startIndex, endIndex )
startIndex
and continues to the character at index endIndex-1
Thus the length of the substring is endIndex-startIndex
.
str
(String): the input stringstartIndex
(integer): the beginning index, inclusiveendIndex
(integer): the end index, inclusivetoUpperCase( str )
str
(String): input stringtoLowerCase( str )
str
(String): input stringtrim( str )
str
(String): input stringpadWithZeros( value, ndigit )
value
(long integer): numeric value to padndigit
(integer): the number of digits in the resulting stringStandard mathematical and trigonometric functions.
E
PI
sin( theta )
theta
(floating point): an angle, in radians.cos( theta )
theta
(floating point): an angle, in radians.tan( theta )
theta
(floating point): an angle, in radians.asin( x )
x
(floating point): the value whose arc sine is to be returned.acos( x )
x
(floating point): the value whose arc cosine is to be returned.atan( x )
x
(floating point): the value whose arc tangent is to be returned.exp( x )
x
(floating point): the exponent to raise e to.log10( x )
x
(floating point): argumentln( x )
x
(floating point): argumentsqrt( x )
x
(floating point): a value.atan2( y, x )
x
,y
)
to polar (r
,theta
).
This method computes the phase
theta
by computing an arc tangent
of y/x
in the range of -pi to pi.
y
(floating point): the ordinate coordinatex
(floating point): the abscissa coordinatepow( a, b )
a
(floating point): the base.b
(floating point): the exponent.Functions for angle transformations and manipulations. In particular, methods for translating between radians and HH:MM:SS.S or DDD:MM:SS.S type sexagesimal representations are provided.
DEGREE
ARC_MINUTE
ARC_SECOND
radiansToDms( rad )
rad
(floating point): angle in radiansradiansToDms( rad, secFig )
rad
(floating point): angle in radianssecFig
(integer): number of decimal places in the seconds fieldradiansToHms( rad )
rad
(floating point): angle in radiansradiansToHms( rad, secFig )
rad
(floating point): angle in radianssecFig
(integer): number of decimal places in the seconds fielddmsToRadians( dms )
dm[s]
, or some others.
Additional spaces and leading +/- are permitted.
dms
(String): formatted DMS stringhmsToRadians( hms )
hm[s]
, or some others.
Additional spaces and leading +/- are permitted.
hms
(String): formatted HMS stringdmsToRadians( deg, min, sec )
In conversions of this type, one has to be careful to get the
sign right in converting angles which are between 0 and -1 degrees.
This routine uses the sign bit of the deg
argument,
taking care to distinguish between +0 and -0 (their internal
representations are different for floating point values).
It is illegal for the min
or sec
arguments
to be negative.
deg
(floating point): degrees part of anglemin
(floating point): minutes part of anglesec
(floating point): seconds part of anglehmsToRadians( hour, min, sec )
In conversions of this type, one has to be careful to get the
sign right in converting angles which are between 0 and -1 hours.
This routine uses the sign bit of the hour
argument,
taking care to distinguish between +0 and -0 (their internal
representations are different for floating point values).
hour
(floating point): degrees part of anglemin
(floating point): minutes part of anglesec
(floating point): seconds part of angleskyDistance( ra1, dec1, ra2, dec2 )
ra1
(floating point): right ascension of point 1 in radiansdec1
(floating point): declination of point 1 in radiansra2
(floating point): right ascension of point 2 in radiansdec2
(floating point): declination of point 2 in radiansskyDistanceDegrees( ra1, dec1, ra2, dec2 )
ra1
(floating point): right ascension of point 1 in degreesdec1
(floating point): declination of point 1 in degreesra2
(floating point): right ascension of point 2 in degreesdec2
(floating point): declination of point 2 in degreeshoursToRadians( hours )
hours
(floating point): angle in hoursdegreesToRadians( deg )
deg
(floating point): angle in degreesradiansToDegrees( rad )
rad
(floating point): angle in radiansraFK4toFK5( raFK4, decFK4 )
raFK4
(floating point): right ascension in B1950.0 FK4 system (radians)decFK4
(floating point): declination in B1950.0 FK4 system (radians)decFK4toFK5( raFK4, decFK4 )
raFK4
(floating point): right ascension in B1950.0 FK4 system (radians)decFK4
(floating point): declination in B1950.0 FK4 system (radians)raFK5toFK4( raFK5, decFK5 )
raFK5
(floating point): right ascension in J2000.0 FK5 system (radians)decFK5
(floating point): declination in J2000.0 FK5 system (radians)decFK5toFK4( raFK5, decFK5 )
raFK5
(floating point): right ascension in J2000.0 FK5 system (radians)decFK5
(floating point): declination in J2000.0 FK5 system (radians)raFK4toFK5( raFK4, decFK4, bepoch )
bepoch
parameter is the epoch at which the position in
the FK4 frame was determined.
raFK4
(floating point): right ascension in B1950.0 FK4 system (radians)decFK4
(floating point): declination in B1950.0 FK4 system (radians)bepoch
(floating point): Besselian epochdecFK4toFK5( raFK4, decFK4, bepoch )
bepoch
parameter is the epoch at which the position in
the FK4 frame was determined.
raFK4
(floating point): right ascension in B1950.0 FK4 system (radians)decFK4
(floating point): declination in B1950.0 FK4 system (radians)bepoch
(floating point): Besselian epochraFK5toFK4( raFK5, decFK5, bepoch )
raFK5
(floating point): right ascension in J2000.0 FK5 system (radians)decFK5
(floating point): declination in J2000.0 FK5 system (radians)bepoch
(floating point): Besselian epochdecFK5toFK4( raFK5, decFK5, bepoch )
raFK5
(floating point): right ascension in J2000.0 FK5 system (radians)decFK5
(floating point): declination in J2000.0 FK5 system (radians)bepoch
(floating point): Besselian epochFunctions for coverting between strings and numeric values.
toString( value )
value
(floating point): numeric valueparseByte( str )
str
(String): string containing numeric representationparseShort( str )
str
(String): string containing numeric representationparseInt( str )
str
(String): string containing numeric representationparseLong( str )
str
(String): string containing numeric representationparseFloat( str )
str
(String): string containing numeric representationparseDouble( str )
str
(String): string containing numeric representationtoByte( value )
value
(floating point): numeric value for conversiontoShort( value )
value
(floating point): numeric value for conversiontoInteger( value )
value
(floating point): numeric value for conversiontoLong( value )
value
(floating point): numeric value for conversiontoFloat( value )
value
(floating point): numeric value for conversiontoDouble( value )
value
(floating point): numeric value for conversionStandard arithmetic functions including things like rounding, sign manipulation, and maximum/minimum functions.
roundUp( x )
x
(floating point): a value.roundDown( x )
x
(floating point): a valueround( x )
x
(floating point): a floating point value.abs( x )
x
(integer): the argument whose absolute value is to be determinedabs( x )
x
(floating point): the argument whose absolute value is to be determinedmax( a, b )
a
(integer): an argument.b
(integer): another argument.max( a, b )
a
(floating point): an argument.b
(floating point): another argument.min( a, b )
a
(integer): an argument.b
(integer): another argument.min( a, b )
a
(floating point): an argument.b
(floating point): another argument.Here are some examples for defining new columns;
the expressions below could appear as the <expr>
in a
tpipe
-addcol
or -sortexpr
filter specifier.
(first + second) * 0.5
sqrt(variance)
radiansToDegrees(DEC_radians) degreesToRadians(RA_degrees)
parseInt($12) parseDouble(ident)
toString(index)
toShort(obs_type) toDouble(range)or
(short) obs_type (double) range
hmsToRadians(RA1950) dmsToRadians(decDeg,decMin,decSec)
radiansToDms($3) radiansToHms(RA,2)
min(1000, max(value, 0))
jmag == 9999 ? NULL : jmag
NULL_jmag ? 9999 : jmag
psfCounts[2]
tpipe
-select
filter specifier)
RA > 100 && RA < 120 && Dec > 75 && Dec < 85
$2*$2 + $3*$3 < 1 skyDistance(ra0,dec0,degreesToRadians(RA),degreesToRadians(DEC))<15*ARC_MINUTE
index <= 100(though you could use
tpipe -head 100
instead)index % 10 == 0(though you could use
tpipe -every 10
instead)equals(SECTOR, "ZZ9 Plural Z Alpha") equalsIgnoreCase(SECTOR, "zz9 plural z alpha") startsWith(SECTOR, "ZZ") contains(ph_qual, "U")
matches(SECTOR, "[XYZ] Alpha")
! NULL_ellipticity
This section contains some notes on getting the most out of the algebraic expressions facility. If you're not a Java programmer, some of the following may be a bit daunting - read on at your own risk!
This note provides a bit more detail for Java programmers on what is going on here; it describes how the use of functions in STILTS algebraic expressions relates to normal Java code.
The expressions which you write are compiled to Java bytecode
when you enter them (if there is a 'compilation error' it will be
reported straight away). The functions listed in the previous subsections
are all the public static
methods of the classes which
are made available by default. The classes listed are all in the
package uk.ac.starlink.ttools.func
.
However, the public static methods are all imported into an anonymous
namespace for bytecode compilation, so that you write
(sqrt(x,y)
and not Maths.sqrt(x,y)
.
The same happens to other classes that are imported (which can be
in any package or none) - their public
static methods all go into the anonymous namespace. Thus, method
name clashes are a possibility.
This cleverness is all made possible by the rather wonderful JEL.
There is another category of functions which can be used apart from those listed in Section 4.4. These are called, in Java/object-oriented parlance, "instance methods" and represent functions that can be executed on an object.
It is possible to invoke any of its public
instance methods on any object
(though not on primitive values - numeric and boolean ones).
The syntax is that you place a "." followed by the method invocation
after the object you want to invoke the method on,
hence NAME.substring(3)
instead of substring(NAME,3)
.
If you know what you're doing, feel free to go ahead and do this.
However, most of the instance methods you're likely to want to use
have equivalents in the normal functions listed in the previous section,
so unless you're a Java programmer or feeling adventurous,
you may be best off ignoring this feature.
The functions provided by default for use with algebraic expressions, while powerful, may not provide all the operations you need. For this reason, it is possible to write your own extensions to the expression language. In this way you can specify abritrarily complicated functions. Note however that this will only allow you to define new columns or subsets where each cell is a function only of the other cells in the same row - it will not allow values in one row to be functions of values in another.
In order to do this, you have to write and compile a (probably short) program in the Java language. A full discussion of how to go about this is beyond the scope of this document, so if you are new to Java and/or programming you may need to find a friendly local programmer to assist (or mail the author or Starlink's QUICK service). The following explanation is aimed at Java programmers, but may not be incomprehensible to non-specialists.
The steps you need to follow are:
jel.classes
system property (colon-separated if there are several)
as described in Section 2.5
Any public static methods defined in the classes thus specified will then be available for use. They should be defined to take and return the relevant primitive or Object types for the function required. For instance a class written as follows would define a three-value average:
public class AuxFuncs { public static double average3( double x, double y, double z ) { return ( x + y + z ) / 3.0; } }and the command
tpipe -addcol AVERAGE 'average3($1,$2,$3)'would add a new column called AVERAGE giving the average of the first three existing columns. Exactly how you would build this is dependent on your system, but it might involve doing something like the following:
AuxFuncs.java
containing the above codejavac AuxFuncs.java
"tpipe -Djel.classes=AuxFuncs -classpath .
"This is STILTS, Starlink Tables Infrastructure Library Tool Set. It is a collection of non-graphical utilites for general purpose table and VOTable manipulation developed by Starlink.
Releases to date have been as follows:
This appendix provides the reference documentation for the commands in the package. For each one a description of its purpose, a list of its command-line arguments, and some examples are given.
tcopy
is a table copying tool.
It simply copies a table from one place to another, but since
you can specify the input and output formats as desired, it works
as a converter from any of the supported
input formats
to any of the supported
output formats.
tcopy
is designed as a drop-in replacement for the old
tablecopy
(uk.ac.starlink.table.TableCopy
)
tool which was supplied with STIL and TOPCAT - it has the same
arguments and behaviour as tablecopy
,
but is implemented on top of tpipe
and will in some cases be more efficient.
The basic usage of tcopy
is
tcopy [-ifmt <in-format>] [-ofmt <out-format>] [<other-flags>] <in-table> [<out-table>]If you don't have the Unix scripts installed, invoke it as described in Section 2 using the classname
uk.ac.starlink.ttools.TableCopy
.
The most important arguments are as follows:
-ifmt <in-format>
<in-table>
automatically,
but this cannot always be done correctly,
in which case the program will exit with an error
explaining which formats were attempted.
-ofmt <out-format>
<in-table>
-
",
the input table will be read from standard input.
In this case the input format must be given explicitly using the
-ifmt
flag.
<out-table>
-
", the output table will be written to standard output.
In this case the output format must be given explicitly using the
-ofmt
flag.
The following generic flags can also be used:
-h[elp]
-debug
-disk
OutOfMemoryError
s.
This flag is in most cases equivalent to specifying the system
property -Dstartable.storage=disk
.
-v[erbose]
Here are some examples of tcopy
in use:
tcopy stars.fits stars.xml
stars.xml
filename is examined to make a guess at the kind of output to write:
the .xml
ending is taken to mean a TABLEDATA-encoded
VOTable.
tcopy -ifmt fits stars.fits -ofmt votable
tcopy -ofmt text http://remote.host/data/vizer.xml.gz#4 -
#4
at the end of the URL
indicates that the data from the fifth TABLE
element
in the remote document are to be used. The gzip compression of
the table is taken care of automatically.
tcopy -ifmt csv -ofmt latex spec.csv
tcopy -classpath /usr/local/jars/pg73jdbc3.jar \ -Djdbc.drivers=org.postgresql.Driver \ "jdbc:postgresql://localhost/imsim#SELECT ra, dec, Imag, Kmag FROM dqc" \ -ofmt fits wfslist.cat
jdbc.drivers
system property.
If a username or password is required, it will be prompted for on the
command line.
As you can see, using SQL from Java is a bit fiddly,
and there are other ways to perform this
setup than on the command line - see Section 2.6.
tpipe
is the main tool in this package
for general purpose manipulation of tables.
It is extremely flexible, and can do the following things
amongst others:
Using tpipe
is more like using a Unix pipeline than
using a single Unix command. You give
an input specifer which determines the table to be operated on,
some filter specifiers which determine the operations which will
be performed on it,
and an output specifier which determines what should happen to the
processed data.
The table is streamed through the processing filters in the order
in which you've given them, and the processed data are eventually
sent to the destination, which may be an output table file or
some other operation like displaying it in TOPCAT or calculating
the per-column statistics. In most cases
the processing is passed through the pipeline a row at a time,
meaning that the amount of memory required is small
(though in some cases, for instance sorting, this is not possible).
Although a similar effect could be achieved by stringing together
several single-operation tpipe
invocations in an
actual Unix pipe,
with the data flowing between the commands in byte streams using
one or other of the supported table formats, the way tpipe
works makes it much more efficient to do all the work within one
invocation.
The basic form of a tpipe
invocation is therefore
tpipe <input-specifier> <filter-specifiers> <mode-specifier> <other-flags>(though in fact these elements can appear in any order and they are all optional). If you don't have the Unix scripts installed, invoke it as described in Section 2 using the classname
uk.ac.starlink.ttools.TablePipe
.
The different sets of arguments are described in the following subsections.
There are many different flags, some with supplementary arguments,
which may look daunting. However, if you make an error in specifying
them, tpipe
will try to print a message which explains
what has gone wrong, so with a little bit of trial and error it
should be possible to make it do what you want.
The input specifier determines the input table on which the processing will be performed. It has the form:
[-ifmt <in-format> [-stream]] <in-table>which is interpreted as follows:
-ifmt <in-format>
<in-format>
is the name of one of the input formats
described in Section 3.1.
If the -ifmt
flag is not used, auto format-detection
is used (OK for FITS and VOTables). For other formats, such as CSV
or ASCII, you must name a format using this flag.
If you give an unknown format (e.g. -ifmt help
) a list
of the formats that are known will be printed.
-stream
-ifmt
flag in this case.
Depending on the required operations and processing mode, this may fail
(sometimes you need to read the input file more than once) - if so
specifying -cache
near the start of the filter specifiers
may help.
It is not normally necessary to specify this flag - in most cases
the data will be streamed automatically if that is the best thing
to do.
<in-table>
-
"
to specify standard input
(in which case -stream
is implicit).
The filter specifiers each specify a processing step which is performed on a table, transforming an input table to an output one. You can have any combination of them, and they are used in the order that they are given on the command line. They are like filter-type commands in a Unix pipeline. Some of them have additional optional or mandatory arguments.
-select <expr>
<expr>
evaluates to true.
<expr>
is an expression using the syntax described
in Section 4 with a boolean-type value.
-sort [-down] [-nullsfirst] <colid-list>
<colid-list>
. <colid-list>
is a space-separated list of column identifiers
(names, $IDs or numbers, where 1 is the first column).
One or more columns may be specified: sorting is done on the
values in the first-specified field, but if they are equal the
tie is resolved by looking at the second-specified field, and so on.
If the -down
flag is used,
the sort order is descending instead
of ascending. If the -nullsfirst
flag is used, blank
entries are considered to come at the start of the collation sequence
instead of the end.
-sortexpr <expr>
<expr>
is described in Section 4.
Its value must be of a type that it makes sense to sort, for instance
numeric.
-every <step>
<step>
'th row in the result,
starting with the first row.
-head <nrows>
<nrows>
rows of the table.
-tail <nrows>
<nrows>
rows of the table.
-addcol [-after <col-id> | -before <col-id>]
<col-name> <expr>
<col-name>
defined by the algebraic expression
<expr>
.
Expression syntax is described in Section 4. By default
the new row appears after the last row of the table, but you
can position it using either the -after
or
-before
flags. In either
case, a <col-id>
is either the column's name
(if it is syntactically a Java identifier),
or its number (the first column is 1), or its $ID
($1
is the first column).
-keepcols <colid-list>
<colid-list>
, in that order.
<colid-list>
is space-separated.
col-id
is either the column's name
(if it is syntactically a Java identifier) or its number
(the first column is 1) or its $ID
($1
is the first column).
-delcols <colid-list>
<colid-list>
is a space-separated list of
identifiers which are either a column's name
(if it is syntactically a Java identifier) or its number
(the first column is 1) or its $ID
($1
is the first column).
-explode
-cache
-progress
Specifying -verbose
has the effect of inserting a
-progress
flag at the start of the pipeline,
so you can see how much progress has been made through the initial
input table.
By putting a -progress
at different points in pipeline you can monitor how
far different stages of the processing have progressed.
If you insert more than one -progress
however,
output to the terminal is going to get quite messy.
-random
-sequential
If no filter specifiers are given, the input table will be sent directly to its destination without any modifications.
The mode specifier determines what happens to the processed table when it reaches the output end of the pipeline. Only one of the following should be specified:
-write [-ofmt <out-format>] [-o <out-table>]
-o
flag is specified,
output is <out-table>
;
otherwise, or if <out-table>
has the special
value "-
", it's streamed to standard output.
The output format is named using the -ofmt
flag
(see Section 3.2); if not supplied,
an attempt is made to guess the format from the destination name.
If neither -o
nor -ofmt
is specified,
it's written in formatted text format to standard output
(equivalent to -o - -ofmt text
).
If you give an unknown output format (e.g. -ofmt help
)
then a list of all known formats will be printed.
-tosql <jdbc-url> [-user <username>]
[-password <password>]
-Djdbc.drivers
set as usual
(see Section 2.6).
You can specify your SQL connection username and password or not -
you will be prompted on the terminal if they are required.
-stats
-meta
-count
-topcat
If no mode specifier is given, -write
is assumed.
This means that with no mode specifier, the processed table is written to
standard output in text
format.
The following generic flags can also be issued:
-h[elp]
-v[erbose]
-disk
OutOfMemoryError
s.
This flag is in most cases equivalent to specifying the system
property -Dstartable.storage=disk
.
-debug
Here are some examples of tpipe
in use with explanations
of what's going on. For simplicity these examples assume that you have the
tpipe
script installed and are using a Unix-like shell;
see Section 2 for an explanation of how to invoke the command
if you just have the Java classes.
The examples are arranged with one step (input, filter or destination) per line, to make it easier to see what's going on.
tpipe cat.fits
-write
is assumed,
and output is to standard output in text
format.
tpipe -head 5 cat.fits.gz
tpipe -ifmt csv xxx.csv \ -keepcols "index ra dec" \ -write -ofmt fits xxx.fits
cat tab1.vot | tpipe - -addcol IV_SUM "(IMAG+VMAG)" \ -addcol IV_DIFF "(IMAG-VMAG)" \ -delcols "IMAG VMAG" \ -write -ofmt votable -o - \ > tab2.vot
tpipe
command in this case.
The processing steps first add a column representing the sum,
then add column representing the difference, then delete the
original columns.
tpipe 2dfgrs_ngp.fits \ -disk \ -keepcols "SEQNUM AREA ECCENT" \ -sort -down AREA \ -head 20
-disk
flag is supplied, which means that
temporary disk files rather than memory
will be used for caching table data.
tpipe http://archive.org/data/survey.vot -meta
tpipe survey.fits -select "skyDistance(hmsToRadians(RA),dmsToRadians(DEC), \ 0.6457,-0.1190) < 5 * ARC_MINUTE" \ -count
skyDistance
function is an expression which
calculates the distance between the position specified in a row
(as given by its RA and DEC columns) and a given point on the sky
(RA=0.6457 radians, DEC=-0.1190 radians).
Since skyDistance
's arguments and return value are in
radians, some conversions are required: the RA and DEC columns
are sexagesimal strings which are converted using the
hmsToRadians
and dmsToRadians
functions
respectively. The result is compared to a multiple of the
ARC_MINUTE
constant, which is the size of an arcminute
in radians. Any rows of the input table for which this comparison
is true are included in the output.
The functions and constants used here are described in detail
in Section 4.4.3.
tpipe -ifmt ascii survey.txt \ -select "OBJTYPE == 3 && Z > 0.15" \ -keepcols "IMAG JMAG KMAG" \ -stats
tpipe -classpath lib/drivers/mysql-connector-java.jar \ -Djdbc.drivers=com.mysql.jdbc.Driver \ x.fits \ -explode \ -tosql jdbc:mysql://localhost/ASTRO1#TABLEX -user mbt
jdbc.drivers
system property is set to the JDBC driver
class name. The output will be written as a new table TABLEX
in the MySQL database called ASTRO1 on a MySQL server on the
local host, using the priveliges of MySQL user mbt
.
If a password is required, it will be prompted for on the terminal
(the -password
flag could be used to specify it instead).
Any existing table in ASTRO1 with the name TABLEX is overwritten.
The only processing done here is by the -explode
flag,
which takes any columns which have fixed-size array values and
replaces them in the output with multiple scalar columns.
tpipe USNOB.FITS -every 1000000 -stats
The VOTable standard provides for three basic encodings
of the actual data within each table: TABLEDATA, BINARY and FITS.
TABLEDATA is a pure-XML encoding, which is easy for humans
to read and write.
However, it is verbose and not very efficient for transmission
and processing,
for which reason the more compact BINARY format has been defined.
FITS format shares the advantages of BINARY, but is more likely to
be used where a VOTable is providing metadata 'decoration' for
an existing FITS table.
In addition, the BINARY and FITS encodings may carry their data
either inline
(as base64-encoded text content of a STREAM
element)
or externally
(referenced by a STREAM
element's href
attribute).
These different formats have their different advantages and disadvantages. Since, to some extent, programmers are humans too, much existing VOTable software deals in TABLEDATA format even though it may not be the most efficient way to proceed. Conversely, you might wish to examine the contents of a BINARY-encoded table without use of any software more specialised than a text editor. So there are times when it is desirable to convert from one of these encodings to another.
votcopy
is a tool which translates between these
encodings while
making a minimum of other changes to the VOTable document.
The processing may result in some changes to lexical details
such as whitespace in start tags, but the element structure is not
modified. Unlike tpipe
it does not impose
STIL's model of what constitutes a table on the data between
reading it in and writing it out, so subtleties dependent on
the exact structure of the VOTable document will not be mangled.
The only important changes should be the contents of
DATA
elements in the document.
The basic usage of votcopy
is
votcopy [<flags>] [<in-file> [<out-file>]]If you don't have the Unix scripts installed, invoke it as described in Section 2 using the classname
uk.ac.starlink.ttools.VotCopy
.
If <out-file>
is omitted the result is written to
standard output, and if <in-file>
is also omitted
the document to be copied is read from standard input.
<in-file>
may be a filename or URL, and may
represent a VOTable compressed using one of the supported
compression formats (gzip, Unix compress and bzip2).
The flags, which may be given in any order, are as follows:
-f[ormat] tabledata|binary|fits|none
none
is selected, then the tables will
be data-less (contain no DATA
element), leaving only
the document structure. Data-less tables are legal VOTable elements.
-href
STREAM
elements
will contain their data inline or externally.
If -href
is not specified, the output document will
be self-contained, with STREAM
data inline as base64-encoded
characters. If -href
is specified, then for each
table in the document the binary data will be written to a separate
file and referenced by a href
attribute on the
corresponding STREAM
element.
The name of these files is usually determined by the name of the
main output file; but see also the -base
flag.
-base <name>
-href
flag is specified. Normally these are given
names based on the name of the output file.
But if -base <name>
is given, then these will given a name based on <name>
.
The -base
flag is compulsory if -href
is
given and no output file is specified (output is to standard out),
since in this case there is no default base name to use.
-cache
votcopy
without the
-cache
flag when it is required,
an error message will tell you so.
-disk
-cache
is
specified, and only required for large tables.
Equivalent to setting the system property
-Dstartable.storage=disk
.
-encode <xml-encoding>
-debug
-h[elp]
Normal use of votcopy
is pretty straightforward.
We give here a couple of examples of its input and output.
Here is an example VOTable document, cat.vot
:
<VOTABLE> <RESOURCE> <TABLE name="Authors"> <FIELD name="AuthorName" datatype="char" arraysize="*"/> <DATA> <TABLEDATA> <TR><TD>Charles Messier</TD></TR> <TR><TD>Mark Taylor</TD></TR> </TABLEDATA> </DATA> </TABLE> <RESOURCE> <COOSYS equinox="J2000.0" epoch="J2000.0" system="eq_FK4"/> <TABLE name="Messier Objects"> <FIELD name="Identifier" datatype="char" arraysize="10"/> <FIELD name="RA" datatype="double" units="degrees"/> <FIELD name="Dec" datatype="double" units="degrees"/> <DATA> <TABLEDATA> <TR> <TD>M51</TD> <TD>202.43</TD> <TD>47.22</TD> </TR> <TR> <TD>M97</TD> <TD>168.63</TD> <TD>55.03</TD> </TR> </TABLEDATA> </DATA> </TABLE> </RESOURCE> </RESOURCE> </VOTABLE>Note that it contains more structure than just a flat table: there are two
TABLE
elements,
the RESOURCE
element of the second one being nested
in the RESOURCE
of the first.
Processing this document using a generic table tool such as
tpipe
or tcopy
would lose this structure.
To convert the data encoding to BINARY format, we simply execute
votcopy -f binary cat.votand the output is
<?xml version="1.0"?> <VOTABLE> <RESOURCE> <TABLE name="Authors"> <FIELD name="AuthorName" datatype="char" arraysize="*"/> <DATA> <BINARY> <STREAM encoding='base64'> AAAAD0NoYXJsZXMgTWVzc2llcgAAAAtNYXJrIFRheWxvcg== </STREAM> </BINARY> </DATA> </TABLE> <RESOURCE> <COOSYS equinox="J2000.0" epoch="J2000.0" system="eq_FK4"/> <TABLE name="Messier Objects"> <FIELD name="Identifier" datatype="char" arraysize="10"/> <FIELD name="RA" datatype="double" units="degrees"/> <FIELD name="Dec" datatype="double" units="degrees"/> <DATA> <BINARY> <STREAM encoding='base64'> TTUxAAAAAAAAAEBpTcKPXCj2QEecKPXCj1xNOTcAAAAAAAAAQGUUKPXCj1xAS4PX Cj1wpA== </STREAM> </BINARY> </DATA> </TABLE> </RESOURCE> </RESOURCE> </VOTABLE>Note that both tables have been translated to BINARY format. The basic structure of the document is unchanged: the only differences are within the
DATA
elements. If we ran
votcopy -f tabledataon either this output or the original input then the output would be identical (apart perhaps from whitespace) to the input table, since the data are originally in TABLEDATA format.
To generate a VOTable document with the data in external files,
the -href
flag is used. We will output in FITS format
this time. Executing:
votcopy -f fits -href cat.vot fcat.votresults in the error message:
Can't stream, table requires multiple reads for metadata Try -cache option- for technical reasons (FITS output requires the input tables to to be read in two passes to assess the number of rows) this can't be done in a single stream which is how
votcopy
usually works.
So we follow the offered advice and use the -cache
flag:
votcopy -f fits -href cat.vot fcat.vot -cachewhich writes the following to the file
fcat.vot
:
... <DATA> <FITS> <STREAM href="fcat-1.fits"/> </FITS> </DATA> ... <DATA> <FITS> <STREAM href="fcat-2.fits"/> </FITS> </DATA> ...(the unchanged parts of the document have been skipped here for brevity). The actual data are written in two additional files in the same directory as the output file,
fcat-1.fits
and
fcat-2.fits
. These filenames are based on the
main output filename, but can be altered using the -base
flag if required. Note this has also given you FITS binary table
versions of all the tables in the input VOTable document, which can be
operated on by normal FITS-aware software quite separately from the VOTable
if required.
The VOTable standard, while not hugely complicated, has a number of subtleties and it's not difficult to produce VOTable documents which violate it in various ways. In fact it's probably true to say that most VOTable documents out there are not strictly legal. In some cases the errors are small and a parser is likely to process the document without noticing the trouble. In other cases, the errors are so serious that it's hard for any software to make sense of it. In many cases in between, different software will react in different ways, in the worst case appearing to parse a VOTable but in fact understanding the wrong data.
votlint
is a program which can check a VOTable document
and spot places where it does not conform to the VOTable standard,
or places which look like they may not mean what the author intended.
It is meant for use in two main scenarios:
Validating a VOTable document against the VOTable schema or DTD
of course goes a long way towards checking a VOTable document for errors
(though it's clear that many VOTable authors don't even go this far),
but it by no means does the whole job, simply because the schema/DTD
specification languages don't have the facilities
to understand the data structure
of a VOTable document. For instance the VOTable schema
will allow any plain text content in a TD
element, but whether
this makes sense in a VOTable depends on the datatype
attribute of the corresponding FIELD
element. There are many
other examples.
votlint
tackles this by parsing the VOTable document
in a way which understands its structure and assessing the content
as critically as it can. For any incorrect or questionable content
it finds, it will output a short message describing the problem
and giving its location in the document. What you do with this
information is then up to you.
Using votlint
is very straightforward.
If a non-flag argument is given it is
assumed to be the location (filename or URL) of a VOTable document.
Otherwise, the document will be read from standard input.
Error and warning messages will be written on standard error.
Each message is prefixed with the location at which the error was
found (if possible the line and column are shown, though this is
dependent on your JVM's default XML parser).
The processing is SAX-based, so arbitrarily long tables can
be processed without heavy memory use.
votlint
can't guarantee to pick up every possible
error in a VOTable document, but it ought to pick up many of the
most serious errors that are typically made in authoring VOTables.
The basic usage of votlint
is
votlint [<flags>] [<in-file>]If you don't have the Unix scripts installed, invoke it as described in Section 2 using the classname
uk.ac.starlink.ttools.VotLint
.
If <in-file>
is omitted then the document to be checked
is read from standard input.
<in-file>
may be a filename or URL, and may
represent a VOTable compressed using one of the supported
compression formats (gzip, Unix compress and bzip2).
The flags, which may be given in any order, are as follows:
-novalid
votlint
's own checks on the
submitted document, it is validated against an appropriate version
of the VOTable DTD which picks up such things as the existence
of unknown elements and attributes, elements in the wrong place,
and so on. Sometimes however, particularly when XML namespaces are
involved, the validator can get confused and may produce a lot
of spurious errors.
Specifying the -novalid
flag prevents this validation
step so that only votlint
's checks are performed.
In this case a few, but by no means all violations of the VOTable
standard concerning document structure will be picked up.
-version <vers>
<vers>
can be
1.0
or 1.1
.
The version may be noted within the document using the
version
attribute of the document's
VOTABLE
element; if it is and it conflicts with the
version specified using this flag, a warning is issued.
-h[elp]
-debug
Votlint checks that the XML input is well-formed, and, unless the
-novalid
flag is supplied, that it validates against the
1.0 or 1.1 (as appropriate) DTD. Although VOTable 1.1 is properly
defined against an XML Schema rather than a DTD, in conjunction with
the other checks done, the DTD validation turns out to be pretty comprehensive.
Some of the DTD validity checks are also done by
votlint
internally, so that some validity-type
errors may give rise to more than one warning.
In general, the program errs on the side of verbosity.
In addition to these checks, the following checks are carried out, and lead to ERROR reports if violations are found:
TD
contents incompatible with FIELD
declared
datatype
/arraysize
attributesFIELD
metadataPARAM
values incompatible with declared
datatype
/arraysize
arraysize
declarationsTD
elements with the wrong number of elementsPARAM
values with the wrong number of
elementsnrows
attribute on TABLE
element different
from the number of rows actually in the tableVOTABLE
version
attribute is unknownref
attributes without matching ID
elements
elsewhere in the documentID
on multiple elements.Additionally, the following conditions, which are not actually forbidden by the VOTable standard, will generate WARNING reports. Some of these may result from harmless constructions, but it is wise at least to take a look at the input which caused them:
TD
elements in row of TABLEDATA
tableTABLE
with no FIELD
eleementsFIELD
or PARAM
elements with
datatype
of either
char
or unicodeChar
and undeclared arraysize
-
this is a common error which can result in
ignoring all but the first character in TD
elements from
a columnref
attributes which reference other elements by
ID
where the reference makes no, or questionable sense
(e.g. FIELDref
references FIELD
in a
different table)FIELD
s) with the
same name
attributesHere is a brief example of running votlint
against
a (very short) imperfect VOTable document. If the document looks like
this:
<VOTABLE version="1.1"> <RESOURCE> <TABLE nrows="2"> <FIELD name="Identifier" datatype="char"/> <FIELD name="RA" datatype="double"/> <FIELD name="Dec" datatype="double"/> <DESCRIPTION>A very small table</DESCRIPTION> <DATA> <TABLEDATA> <TR> <TD>Fomalhaut</TD> <TD>344.48</TD> <TD>-29.618</TD> <TD>HD 216956</TD> </TR> </TABLEDATA> </DATA> </TABLE> </RESOURCE> </VOTABLE>then the output of a
votlint
run looks like this:
INFO (l.4): No arraysize for character, FIELD implies single character ERROR (l.7): Element "TABLE" does not allow "DESCRIPTION" here. WARNING (l.11): Characters after first in char scalar ignored (missing arraysize?) WARNING (l.15): Wrong number of TDs in row (expecting 3 found 4) ERROR (l.18): Row count (1) not equal to nrows attribute (2)Note the warning at line 11 has resulted from the same error as the one at line 4 - because the
FIELD
element has no
arraysize
attribute, arraysize="1"
(single character) is assumed,
while the author almost certainly intended arraysize="*"
(unknown length string).
By examining these warnings you can see what needs to be done to fix this table up. Here is what it should look like:
<VOTABLE version="1.1"> <RESOURCE> <TABLE nrows="1"> <!-- change row count --> <DESCRIPTION>A very small table</DESCRIPTION> <!-- move DESCRIPTION --> <FIELD name="Identifier" datatype="char" arraysize="*"/> <!-- add arraysize --> <FIELD name="RA" datatype="double"/> <FIELD name="Dec" datatype="double"/> <DATA> <TABLEDATA> <TR> <TD>Fomalhaut</TD> <TD>344.48</TD> <TD>-29.618</TD> </TR> <!-- remove extra TD --> </TABLEDATA> </DATA> </TABLE> </RESOURCE> </VOTABLE>When fed this version,
votlint
gives no warnings.