Starlink User Note
256
Mark Taylor
7 October 2005
$Id: sun256.xml,v 1.50 2006/05/10 17:51:19 mbt Exp $
stilts
command
calc
: Calculator
tcat
: Table Concatenater
tcopy
: Table Format Converter
tcube
: N-dimensional Histogram Calculator
tmatch2
: Pair Crossmatcher
tpipe
: Generic Table Pipeline Utility
votcopy
: VOTable Encoding Translator
votlint
: VOTable Validity Checker
STILTS is a set of command-line tools for processing tabular data. It has been designed for, but is not restricted to, use on astronomical data such as source catalogues. It contains both generic (format-independent) table processing tools and tools for processing VOTable documents. Facilities offered include format conversion, format validation, column calculation and rearrangement, row selection, sorting, crossmatching, statistical calculations and metadata display. Calculations on cell data can be performed using a powerful and extensible expression language.
The package is written in pure Java and based on STIL, the Starlink Tables Infrastructure Library. This gives it high portability, support for many data formats (including FITS, VOTable, text-based formats and SQL databases), extensibility and scalability. Where possible the tools are written to accept streamed data so the size of tables which can be processed is not limited by available memory. As well as the tutorial and reference information in this document, detailed on-line help is available from the tools themselves.
STILTS is available under the GNU General Public Licence.
STILTS provides a number of command-line applications which can be used for manipulating tabular data. Conceptually it sits between, and uses many of the same classes as, the packages STIL, which is a set of Java APIs providing table-related functionality, and TOPCAT, which is a graphical application providing the user with an interactive platform for exploring one or more tables. This document is mostly self-contained - it covers some of the same ground as the STIL and TOPCAT user documents (SUN/252 and SUN/253 respectively).
Currently, this package consists of the following commands for generic table manipulation:
tcat
:
Table Concatenater
tcopy
:
Table Format Converter
tcube
:
N-dimensional Histogram Calculator
tmatch2
:
Pair Crossmatcher
tpipe
:
Generic Table Pipeline Utility
calc
:
Calculator
There are many ways you might want to use these tools; here are a few possibilities:
stilts
command
All the functions available in this package can be used from
a single command, which is usually referred to in this document
simply as "stilts
". Depending on how you have installed
the package, you may just type "stilts
",
or something like
java -jar some/path/stilts.jaror
java -classpath topcat-lite.jar uk.ac.starlink.ttools.Stiltsor something else - this is covered in detail in Section 3.
In general, the form of a command is
stilts <stilts-flags> <task-name> <task-args>The forms of the parts of this command are described in the following subsections, and details of each of the available tasks along with their arguments are listed in the command reference at the end of this document. Some of the commands are highly configurable and have a variety of parameters to define their operation. In many cases however, it's not complicated to use them. For instance, to convert the data in a FITS table to VOTable format you might write:
stilts tcopy cat.fits cat.vot
Some flags are common to all the tasks in the STILTS package,
and these are specified after the stilts
invocation itself
and before the task name. They generally have the same effect
regardless of which task is running. These generic flags are as
follows:
-help
stilts
command
itself and exits. The message contains a listing of all the
known tasks.
-version
-verbose
-disk
-Dstartable.storage=disk
.
-debug
-prompt
-prompt
flag,
then you will be prompted for every parameter you have not
explicitly specified to give you an opportunity to enter a value
other than the default.
-batch
-batch
flag,
then you won't be prompted at all.
If you are submitting an error report, please include the result of
running stilts -version
and the output of the troublesome
command with the -debug
flag specified.
The <task-name>
part of the command line is the
name of one of the tasks listed in Appendix A - currently
the available tasks are:
calc
tcat
tcopy
tcube
tmatch2
tpipe
votcopy
votlint
The <task-args>
part of the command line is a
list of parameter assignments,
each giving the value of one of the named parameters belonging to
the task which is specified in the <task-name>
part.
The general form of each parameter assignment is
<param-name>=<param-value>If you want to set the parameter to the null value, which is legal for some but not all parameters, use the special string "
null
".
In some cases you can optionally leave out the <param-name>
part of the assignment (i.e. the parameter is positionally determined);
this is indicated in the task's usage description if the parameter
is described like [<param-name>=]<param-value>
rather than <param-name>=<param-value>
.
If the <param-value>
contains spaces or other special
characters, then in most cases, such as from the Unix shell, you will
have to quote it somehow. How this is done depends on your platform,
but usually surrounding the whole value in single quotes will do the trick.
Tasks may have many parameters, and you don't have to set all of them explicitly on the comand line. For a parameter which you don't set, two things can happen. In many cases, it will default to some sensible value. Sometimes however, you may be prompted for the value to use. In the latter case, a line like this will be written to the terminal:
matcher - Name of matching algorithm [sky]:This is prompting you for the value of the parameter named
matcher
. "Name of matching algorithm" is a short
description of what that parameter does. "sky
" is
the default value (if there is no default, no value will appear
in square brackets).
At this point you can do one of four things:
null
".null
" means the null value,
which is legal for some, but not all parameters.
If the value you enter is not legal, you will see an error
message and you will be invited to try again. help
" or a question mark "?
".
This will output a message
giving a detailed description of the parameter
and prompt you again.stilts
command itself (see Section 2.1).
If you supply the -prompt
flag, then you will be prompted
for every parameter you have not explicitly set. If you supply
-batch
on the other hand, you won't be prompted for
any parameters (and if you fail to set any without legal default
values, the task will fail).
If you want to see the actual values of the parameters for a task
as it runs,
including prompted values and defaulted ones
which you haven't specified explicitly,
you can use the -verbose
flag after the stilts
command:
% stilts -verbose tcopy cat.fits cat.vot ifmt=fits INFO: tcopy in=cat.fits out=cat.vot ifmt=fits ofmt=(auto)
Extensive help is available from stilts
itself about task and its parameters, as described in the next section.
As well as the command descriptions in this document (especially the reference section Appendix A) you can get help for STILTS usage from the command itself. Typing
stilts -helpresults in this output:
Usage: stilts [-help] [-version] [-verbose] [-disk] [-debug] [-prompt] [-batch] <task-name> <task-args> stilts <task-name> help[=<param-name>] Known tasks: calc tcat tcopy tcube tmatch2 tpipe votcopy votlint
For help on the individual tasks, including their parameter lists,
you can supply the word help
after the task name, so for instance
stilts tcopy helpresults in
Usage: tcopy ifmt=<in-format> ofmt=<out-format> [in=]<in-table> [out=]<out-table>
Finally, you can get help on any of the parameters of a task
by writing help=<param-name>
, like this:
stilts tcopy help=ingives
Help for parameter IN in task TCOPY ----------------------------------- Name: in Usage: [in=]<in-table> Summary: Location of input table Description: The location of the input table. This is usually a filename or URL, and may point to a file compressed in one of the supported compression formats (Unix compress, gzip or bzip2). If it is omitted, or equal to the special value "-", the input table will be read from standard input. In this case the input format must be given explicitly using the ifmt parameter.
In some cases, as described in Section 2.3, you will be prompted for the value of a parameter with a line something like this:
matcher - Name of matching algorithm [sky]:In this case, if you enter "
help
" or a question mark,
then the parameter help entry will be printed to the screen, and
the prompt will be repeated.
For more detailed descriptions of the tasks, which includes explanatory comments and examples as well as the information above, see the full task descriptions in the Command Reference.
There are a number of ways of invoking the stilts
command,
depending on how you have installed the package.
If you're using a Unix-like operating system,
the easiest way is to use the stilts
script.
If you have a full starjava installation it is in the
starjava/bin
directory.
Otherwise you can download it separately from wherever you got your
STILTS installation in the first place, or find it
at the top of the stilts.jar
or topcat-*.jar
that contains your STILTS installation, so do something like
unzip stilts.jar stilts chmod +x stiltsto extract it (if you don't have
unzip
,
try jar xvf stilts.jar stilts
).
stilts
is a simple shell script which just invokes java with the
right classpath and the supplied arguments.
To run using the stilts
script, first make sure that
both the java
executable and the stilts
script itself are on your path,
and that the stilts.jar
or topcat-*.jar
jar file is in the same directory as stilts
.
Then the form of invocation is:
stilts <java-flags> <stilts-flags> <task-name> <task-args>A simple example would be:
stilts votcopy format=binary t1.xml t2.xmlin this case, as often, there are no
<java-flags>
or
<stilts-flags>
.
If you use the -classpath
argument or have a CLASSPATH environment variable set,
then classpath elements thus specified will be added to the classpath
required to run the command.
The examples in the
command descriptions below use this form for convenience.
If you don't have a Unix-like shell available however,
you will need to invoke
Java directly with the appropriate classes on your classpath.
If you have the file stilts.jar
, in most cases you can
just write:
java <java-flags> -jar stilts.jar <stilts-flags> <task-name> <task-args>which in practice would look something like
java -jar /some/where/stilts.jar votcopy format=binary t1.xml t2.xml
In the most general case, Java's -jar
flag might be
no good, for one of the following reasons:
stilts.jar
file (such as topcat-full.jar
)java <java-flags> -classpath <class-path> uk.ac.starlink.ttools.Stilts <stilts-flags> <task-name> <task-args>The example above in this case would look something like:
java -classpath /some/where/topcat-full.jar uk.ac.starlink.ttools.Stilts votcopy format=binary t1.xml t2.xml
The
<stilts-flags>
,
<task-name>
and
<task-args>
parts of these invocations are explained in Section 2,
and the
<class-path>
and
<java-flags>
parts are explained in the following subsections.
The classpath is the list of places that Java looks to find
the bits of compiled code that it uses to run an application.
Depending on how you have done your installation the core STILTS
classes could be in various places, but they are probably in a
file with one of the names
stilts.jar
,
topcat-lite.jar
or
topcat-full.jar
.
The full pathname of one of these files can therefore be used as
your classpath. In some cases these files are self-contained and
in some cases they reference other jar files in the filesystem -
this means that they may or may not continue to work if you
move them from their original location.
Under certain circumstances the tools might need additional classes, for instance:
In most cases it is not necessary to specify any additional arguments to the Java runtime, but it can be useful in certain circumstances. The two main kinds of options you might want to specify directly to Java are these:
-Dname=value
.
So for instance to ensure that temporary files are written to
the /home/scratch
directory, you could use the flag
-Djava.io.tmpdir=/home/scratch
-Xmx
flag. To set the heap
memory size to 256 megabytes, use the flag
-Xmx256M(don't forget the 'M' for megabyte). You will probably find performance is dreadful if you specify a heap size larger than the physical memory of the machine you're running on.
Note however that encouraging STILTS to use disk files
rather than memory for temporary storage is often a
better idea than boosting the heap memory -
this is done by specifying the -disk
flag
(stilts -disk <task-name> ...
),
or possibly setting the system property
-Dstartable.storage=disk
(see Section 2.1).
You can specify other options to Java such as tuning and profiling flags etc, but if you want to do that sort of thing you probably don't need me to tell you about it.
System properties are a way of getting information into the
Java runtime - they are a bit like environment variables.
There are two ways to set them when using STILTS: either
on the command line using arguments of the form
-Dname=value
(see Section 3.2)
or in a file in your home directory called
.starjava.properties
, in the form of a
name=value
line.
Thus submitting the flag
-Dvotable.strict=trueon the command line is equivalent to having the following in your
.starjava.properties
file:
# Force strict interpretation of the VOTable standard. votable.strict=true
The following system properties have special significance to STILTS:
java.io.tmpdir
-disk
flag has been
specified (see Section 2.1).
jdbc.drivers
jel.classes
mark.workaround
mark()
/reset()
methods of some java
InputStream
classes. These are rather common,
including in Sun's J2SE system libraries.
Use this if you are seeing errors that say something like
"Resetting to invalid mark
".
Currently defaults to "false".startable.readers
startable.storage
disk
" has basically the same effect as
supplying the "-disk
" argument on the command line
(see Section 2.1).
startable.writers
votable.strict
true
for strict enforcement of the VOTable standard
when parsing VOTables. This prevents the parser from working round
certain common errors, such as missing arraysize
attributes on FIELD
or PARAM
elements with datatype="char"
.
False by default.
This section describes additional configuration which must be done to allow the commands to access SQL-compatible relational databases for reading or writing tables. If you don't need to talk to SQL-type databases, you can ignore the rest of this section. The steps described here are the standard ones for configuring JDBC (which sort-of stands for Java Database Connectivity), described in more detail on Sun's JDBC web page.
To use STILTS with SQL-compatible databases you must:
jdbc.drivers
system property to the name of the
driver class as described in Section 3.3
These steps are all standard for use of the JDBC system. See SUN/252 for information about JDBC drivers known to work with STIL (the short story is that at least MySQL and PostreSQL will work).
Here is an example of using tcopy
to write the results
of an SQL query on a table in a MySQL database as a VOTable:
stilts -classpath /usr/local/jars/mysql-connector-java.jar \ -Djdbc.drivers=com.mysql.jdbc.Driver \ tcopy \ "jdbc:mysql://localhost/db1#SELECT id, ra, dec FROM gsc WHERE mag < 9" \ ofmt=votable gsc.votor invoking Java directly:
java -classpath stilts.jar:/usr/local/jars/mysql-connect-java.jar \ -Djdbc.drivers=com.mysql.jdbc.Driver \ uk.ac.starlink.ttools.TableCopy \ "jdbc:mysql://localhost/db1#SELECT id, ra, dec FROM gsc WHERE mag < 9" \ ofmt=votable gsc.votYou have to exercise some care to get the arguments in the right order here - see Section 3.
Alternatively, you can set some of this up beforehand to make the invocation easier. If you set your CLASSPATH environment variable to include the driver jar file (and the STILTS classes if you're invoking Java directly rather than using the scripts), and if you put the line
jdbc.drivers=com.mysql.jdbc.Driverin the
.starjava.properties
file in your home directory,
then you could avoid having to give the -classpath
and
-Djdbc.drivers
flags respectively.
The generic table commands in STILTS
(currently tpipe
,
tcopy
,
tcat
,
tcube
and tmatch2
)
have no native format for table storage, they can process
data in a number of formats equally well.
STIL has its own model of what a table
consists of, which is basically:
The formats the package knows about are dependent on the input and output handlers currently installed. The ones installed by default are listed in the following subsections. More may be added in the future, and it is possible to install new ones at runtime - see the STIL documentation for details.
Some of the tools in this package ask you to specify the format
of input tables using the ifmt
parameter.
The following list gives the values usually allowed for this
(matching is case-insensitive):
fits
votable
TABLE
element is used,
but this can be altered
by supplying the 0-based index after a '#
' sign,
so "table.xml#4" means the fifth TABLE
element in the document.
ascii
csv
ipac
wdc
In some cases (when using VOTable or FITS format tables) the tools can detect the table format automatically, and no explicit specification is necessary. If this isn't the case and you omit the format specification, the tool will fail with a suitable error message. It is always safe to specify the format explicitly; this will be slightly more efficient, and may lead to more helpful error messages in the case that the table can't be read correctly.
Some of the tools ask you to specify the format of output tables
using the ofmt
parameter.
The following list gives the values usually allowed for this;
in some cases as you can see there are several variants of a given format.
You can abbreviate these names, and the first match in the list below
will be used, so for instance specifying votable
is equivalent
to specifying votable-tabledata
and fits
is equivalent to fits-plus
.
Matching is case-insensitive.
fits-plus
fits-basic
for most purposes)fits-basic
votable-tabledata
votable-binary-inline
STREAM
elementvotable-binary-href
votable-fits-href
votable-fits-inline
STREAM
elementascii
text
csv
csv-noheader
html
TABLE
elementhtml-element
TABLE
elementlatex
tabular
environmentlatex-document
tabular
environmentmirage
In some cases the tools may guess what output format you want by looking at the extension of the output filename you have specified.
Several of the tasks available in STILTS take one or more input tables, do something or other with them, and produce an output table. This is a pretty obvious way to go about things, and in the most straightforward case that's exactly what happens: you name one or more input tables, specify the processing parameters, and name an output table; the task then reads the input tables from disk, does the processing and writes the output table to disk.
However, many of the tasks in STILTS allow you to do pre-processing of the input tables before the main job, post-processing of the output table after the main job, and to decide what happens to the final tabular result, without any intermediate storage of the data. Examples of the kind of pre-processing you might want to do are to rearrange the columns so that they have the right units for the main task, or replace 'magic' values such as -999 with genuine blank values; the kind of post-processing you might want to do is to sort the rows in the output table or delete some of the columns you're not interested in. As for the destination of the final table, you might want to write it to disk, but equally you might not want to store it anywhere, but only be interested in counting the number of rows, or seeing the minima/maxima of a few of the columns, or you might want to send it straight to TOPCAT or some other table viewing application for interactive analysis.
Clearly, you could achieve the same effect by using multiple applications: preprocess your original input tables to write intermediate files on disk, run the main processing application which reads those files from disk and writes a new output file, run another application to postprocess the output file and write a new final output file, and finally do something with this such as counting the rows in it or viewing it in TOPCAT. However, by doing it all within a single task instead, no intermediate results have to be stored, and the whole sequence can be very much more efficient. You can think of this (if it helps) like a Unix pipeline, except what is being streamed from the start to the end of the pipe is not bytes, but table metadata and data. In most cases, the table data is streamed through the pipeline a row at a time, meaning that the amount of memory required is small (though in some cases, for instance row sorting and crossmatching, this is not possible).
Tasks which allow this pre/post-processing, or "filtering",
have parameters with names like "cmd
" which you
use to specify processing steps.
Tasks with multiple input tables
(tmatch2
,
tcat
)
have parameters called icmd1
, icmd2
, ...
for preprocessing the different input tables and
ocmd
for postprocessing the output table.
tpipe
does nothing except
filtering, so there is no distinction between pre- and post-processing,
and its filter parameter is just called cmd
.
tpipe
additionally has a script
parameter which allows you to use a text file to write the
commands in, to prevent the command line getting too long.
In both cases there is a parameter called omode
which defines the "output mode", that is, what happens to the
post-processed output table that comes out of the end of the pipeline.
Section 5.1 lists the processing steps available,
and explains how to use them,
Section 5.2 and Section 5.3 describe the syntax
used in some of these filter commands for specifying columns,
and Section 5.4 describes the available output modes.
See the examples in the
command reference,
and particularly the
tpipe
examples,
for some examples putting all this together.
This section lists the filter commands which can be used for
table pipeline processing, in conjunction with cmd
-
or script
-type parameters.
You can string as many of these together as you like.
On the command line, you can repeat the cmd
(or icmd1
, or ocmd
...) parameter
multiple times, or use one cmd
parameter and
separate different filter specifiers with semicolons (";
").
The effect is the same.
It's important to note that each command in the sequence of processing steps acts on the table at that point in the sequence. Thus
stilts tpipe cmd='delcols 1; delcols 1; delcols 1'has the same effect as
stilts tpipe cmd='delcols "1 2 3"'since in the first case the columns are shifted left after each one is deleted, so the table seen by each step has one fewer column than the one before. Note also the use of quotes in the latter of the examples above, which is necessary so that the
<colid-list>
of the delcols
command is interpreted as one argument not
three separate words.
The syntax of some of these arguments is described elsewhere in this document:
<col-id>
: see Section 5.2
<colid-list>
: see Section 5.3
<expr>
: see Section 7
addcol [-after <col-id> | -before <col-id>] [-units <units>] [-ucd <ucd>] [-desc <description>] <col-name> <expr>
<col-name>
defined
by the algebraic expression <expr>
.
By default the new column appears after the last column
of the table, but you can position it either before or
after a specified column using the -before
or -after
flags respectively.
The -units
, -ucd
and
-desc
flags can be used to define
metadata values for the new column.
addskycoords [-epoch <expr>] [-inunit deg|rad|sex] [-outunit deg|rad|sex] <insys> <outsys> <col-id1> <col-id2> <col-name1> <col-name2>
<col-id>
arguments give identifiers for
the two input coordinate columns
in the coordinate system named by
<insys>
, and
the <col-name>
arguments name
the two new columns,
which will be in the coordinate system named by
<outsys>
.
The <insys>
and <outsys>
coordinate system specifiers are one of
fk5
: FK5 J2000.0 (Right Ascension, Declination)fk4
: FK4 B1950.0 (Right Ascension, Declination)galactic
: IAU 1958 Galactic (Longitude, Latitude)supergalactic
: de Vaucouleurs Supergalactic (Longitude, Latitude)ecliptic
: Ecliptic (Longitude, Latitude)The -inunit
and -outunit
flags
may be used to indicate the units of the existing coordinates
and the units for the new coordinates respectively;
use one of
degrees
, radians
or
sexagesimal
(may be abbreviated),
otherwise degrees will be assumed.
For sexagesimal, the two corresponding columns must be
string-valued in forms like hh:mm:ss.s and dd:mm:ss.s
respectively.
For certain conversions, the value specified by the
-epoch
flag is of significance.
Where significant its value defaults to 2000.0.
assert <expr>
<expr>
does not
evaluate true for any row of the table, execution terminates
with an error.
As long as no error occurs, the output table is identical
to the input one.
The exception generated by an assertion violation is of class
uk.ac.starlink.ttools.filter.AssertException
although that is not usually obvious if you are running from
the shell in the usual way.
badval <bad-val> <colid-list>
<colid-list>
any occurrence of the value <bad-val>
is replaced by a blank entry.
cache
check
colmeta [-name <name>] [-units <units>] [-ucd <ucd>] [-desc <descrip>] <colid-list>
<colid-list>
can be set by using some or all of the listed flags.
Typically, <colid-list>
will simply be
the name of a single column.
delcols <colid-list>
every <step>
<step>
'th row in the
result, starting with the first row.
explodeall
explodecols <colid-list>
<colid-list>
must have a fixed-length array type,
though not all the arrays need to have the same number
of elements.
head <nrows>
<nrows>
rows of
the table.
keepcols <colid-list>
<colid-list>
, in that order.
The same column may be listed more than once,
in which case it will appear in the output table more than once.
meta [<item> ...]
By default the output table contains columns for the
items Index, Name, Class, Shape, Units, Description and UCD,
as well as any table-specific column metadata items that
the table contains.
The output may be customised however by supplying one or more
<item>
headings. These may be selected
from the list Index, Name, Class, Shape, Units, Description, UCD and UCD_desc,
as well as any table-specific metadata. It is not an error
to specify an item for which no metadata exists in any of
the columns.
Any table parameters of the input table are propagated to the output one.
progress
random
replacecol [-name <name>] [-units <units>] [-ucd <ucd>] [-desc <descrip>] <col-id> <expr>
<expr>
.
You can specify the metadata for the new column using the
-name
, -units
, -ucd
and -desc
flags; for any of these items which you
do not specify, they will take the values from the column
being replaced.
You can reference the replaced column in the expression,
so for example
"replacecol pixsize pixsize*2
" just multiplies
the values in column pixsize
by 2.
replaceval <old-val> <new-val> <colid-list>
<colid-list>
any instance of <old-val>
is replaced by
<new-val>
.
The value string 'null
' can be used for either
<old-value>
or <new-value>
to indicate a blank value.
select <expr>
<expr>
evaluates to true.
<expr>
must be an expression which
evaluates to a boolean value (true/false).
sequential
sort [-down] [-nullsfirst] <key-list>
<key-list>
; sorting is done on the
first expression first, but if that results in a tie then
the second one is used, and so on.
Each expression must evaluate to a type that
it makes sense to sort, for instance numeric.
If the -down
flag is used, the sort order is
descending rather than ascending.
Blank entries are usually considered to come at the end
of the collation sequence, but if the -nullsfirst
flag is given then they are considered to come at the start
instead.
sorthead [-tail] [-down] [-nullsfirst] <nrows> <key-list>
<nrows>
rows at the head
of the resulting sorted table.
The sort key expressions appear,
as separate (space-separated) words,
in <key-list>
; sorting is done on the
first expression first, but if that results in a tie then
the second one is used, and so on.
If the -tail
flag is used, then the
last <nrows>
rows rather than the first
ones are retained.
If the -down
flag is used the sort order is
descending rather than ascending.
Blank entries are usually considered to come at the end
of the collation sequence, but if the -nullsfirst
flag is given then they are considered to come at the start
instead.
Each expression must evaluate to a type that
it makes sense to sort, for instance numeric.
This filter is functionally equivalent to using
sort
followed by head
,
but it can be done in one pass and is usually cheaper
on memory and faster, as long as <nrows>
is significantly lower than the size of the table.
stats [<item> ...]
By default the output table contains columns for the
items Name, Mean, StDev, Minimum, Maximum and NGood.
The output may be customised however by supplying one or more
<item>
headings. These may be selected
from the list NGood, NBad, Mean, StDev, Variance, Skew, Kurtosis, Minimum, Maximum, Sum, MinPos, MaxPos, Cardinality, Median, Quartile1, Quartile2 and Quartile3,
or have the form "Q.nn" to represent the quantile
corresponding to the proportion 0.nn; for instance
Q.5 is an alias for Median, and Q.25 for Quartile1.
Any parameters of the input table are propagated to the output one.
Note that quantile calculations (including median and quartiles) can be expensive on memory. If you want to calculate quantiles for large tables, it may be wise to reduce the number of columns to only those you need the quantiles for earlier in the pipeline. No interpolation is performed when calculating quantiles.
tablename <name>
tail <nrows>
<nrows>
rows
of the table.
uniq [-count] [<colid-list>]
<colid-list>
parameter is given
then only the values in the specified columns must be equal
in order for the row to be removed.
If the -count
flag is given, then an additional
column with the name DupCount will be
prepended to the table giving a count of the number of duplicated
input rows represented by each output row. A unique row
has a DupCount value of 1.
If an argument is specified in the help text for a
command with the symbol <col-id>
it means you must give a string which identifies one of the
existing columns in a table.
There are three ways you can specify a column in this context:
-
').
It is usually matched case insensitively. If multiple columns
have the same name, the first one that matches is selected.
Tip: if counting which column has which index is giving you a
headache, running tpipe
with omode=meta
or
omode=stats
on the table may help.
If an argument is specified in the help text for a command
with the symbol <colid-list>
it means you
must give a string which identifies a list of zero, one or more
of the existing columns in a table.
The string you specify is a separated into separate tokens by
whitespace, which means that you will normally
have to surround it in single or double quotes to ensure
that it is treated as a single argument and not several of them.
Each token in the colid-list
string may be one of
the following:
-
').
It is usually matched case insensitively. If multiple
columns have the same name, the first one that matches is selected.
*
'
which matches any sequence of characters. To match an unknown
sequence at the start or end of the string an asterisk must be
given explicitly. Other than that, matching is usually
case insensitive. The order of the expanded list is the same
as the order in which the columns appear in the table.
Thus "col*
" will match columns named
col1
, Column2
and COL_1024
,
but not decOld
.
"*MAG*
" will match columns named
magnitude
, ABS_MAG_U
and JMAG
.
"*
" on its own
expands to a list of all the columns of the table in order.
Specifying a list which contains a given column more than once is not usually an error, but what effect it has depends on the function you are executing.
This section lists the output modes which can be used as
the value of the omode
parameter of
tpipe
and other commands.
Typically, having produced a result table by pipeline processing
an input one, you will write it out by specifying
omode=out
(or not using the omode
parameter at all -
out
is the default). However, you can do other things
such as calculate statistics, display metadata, etc. In some of
these cases, additional parameters are required. The different
output modes are listed below.
mode=cgi
out
mode
but a short CGI header giving the MIME Content-Type
is prepended to the output.
Additional parameters for this output mode are:
ofmt = <out-format>
votable
]mode=count
mode=discard
assert
filter.
mode=meta
meta
filter in Section 5.1
for more flexible output of table metadata.
mode=out
Additional parameters for this output mode are:
out = <out-table>
-
]ofmt = <out-format>
(auto)
"
(the default),
then the output filename will be
examined to try to guess what sort of file is required
usually by looking at the extension.
If it's not obvious from the filename what output format is
intended, an error will result.
[Default: (auto)
]mode=plastic
Additional parameters for this output mode are:
transport = string|file
string
:
VOTable serialized as a string and passed as a call parameter
(ivo://votech.org/votable/load
).
Not suitable for very large files.file
:
VOTable written to a temporary file and the filename passed as
a call parameter
(ivo://votech.org/votable/loadFromURL
).
The file ought to be deleted once it has been loaded.
Not suitable for inter-machine communication.null
) then a decision will
be taken based on the apparent size of the table.
client = <app-name>
mode=stats
mode=topcat
mode=tosql
-Djdcb.drivers
set as usual
(see Section 3.4).
Additional parameters for this output mode are:
protocol = <jdbc-protocol>
mysql
,
and for PostgreSQL's driver it is postgres
.
For other drivers, you may have to consult the driver
documentation.
host = <value>
localhost
]database = <db-name>
newtable = <table-name>
user = <username>
mbt
]password = <passwd>
STILTS offers flexible and efficient facilities for crossmatching tables. Crossmatching is identifying different rows, which may be in the same or different tables, that refer to the same item. In an astronomical context such an item is usually, though not necessarily, an astronomical source or object. This operation corresponds to what in database terminology is called a join.
There are various complexities to specifying such a match. In the first place you have to define what is the condition that must be satisfied for two rows to be considered matching. In the second place you must decide what happens if, for a given row, more than one match can be found. Finally, you have to decide what to do having worked out what the matched rows are; the result will generally be presented as a new output table, but there are various choices about what columns and rows it will consist of. Some of these issues are discussed in this section, and others in the reference sections on the tools themselves in Appendix A.
Matching can in general be a computationally intensive process.
The algorithm used by STILTS, except in pathological cases,
scales as O(N log(N)) or thereabouts,
where N is the total number of rows in all the tables being matched.
No preparation (such as sorting) is required on the tables prior to
invoking the matching operation.
It is reasonably fast; for instance an RA, Dec positional match
of two 105-row catalogues takes of the order of 60 seconds
on current (2005 laptop) hardware. Attempting matches with large tables can
lead to running out of memory; the calculation just mentioned required a
java heap size of around 200Mb (-Xmx200M
).
In the current release of STILTS the only crossmatching task is
tmatch2
which finds matches
between pairs of tables. In future versions however facilities for
finding matches within the same table, and in more than two tables,
will be introduced.
Determining whether one row represents the same item as another is done by comparing the values in certain of their columns to see if they are the same or similar. The most common astronomical case is to say that two rows match if their celestial coordinates (right ascension and declination) are within a given small radius of each other on the sky. There are other possibilities; for instance the coordinates to compare may be in a Cartesian space, or have a higher (or lower) dimensionality than two, or the match may be exact rather than within an error radius....
To determine the matching criteria, you set the values of the
following parameters of tmatch2
:
matcher
params
values*
tmatch2
you must specify both
values1
and values2
.
For example, suppose we wish to locate objects in two tables which are
within 3 arcseconds of each other on the sky. One table has columns
RA and DEC which give coordinates in degrees, and the other has columns
RArad and DECrad which give coordinates in radians. These are the
arguments which would be used to tell tmatch2
what the match
criteria are:
matcher=sky params=3 values1='RA DEC' values2='radiansToDegrees(RArad) radiansToDegrees(DECrad)'It is clearly important that corresponding values are comparable (in the same units) between the tables being matched, and in geometrically sensitive cases such as matching on the sky, it's important that they are the units expected by the matcher as well. To determine what those units are, either consult the roster below, or run the following command:
stilts tmatch2 help=matcherwhich will tell you about all the known matchers and their associated
params
and values*
parameters.
Here is a list of all the basic matcher
types and the
requirements of their associated params
and
values*
parameters. The units of the required values
are given where significant.
matcher=sky values*='<ra/degrees> <dec/degrees>' params='<max-error/arcsec>'
ra
, dec
positions are within
max-error
arcseconds of each other along a great circle.
matcher=skyerr values*='<ra/degrees> <dec/degrees> <error/arcsec>' params='<max-error/arcsec>'
ra
, dec
positions is smaller than
both the fixed max-error
value
and the sum of the two per-row error
values.
If either of the error
values is blank,
then any separation up to max-error
is considered a match.
According to these rules, you might decide to set max-error
to an arbitarily large number so that only the sum of error
s
will determine the actual match criteria.
However please don't do this, since max-error
also functions as a tuning parameter for the matching algorithm,
and ought to be reasonably close to the actual maximum acceptable
separation.
matcher=sky3d values*='<ra/degrees> <dec/degrees> <distance>' params='<error/Units of distance>'
ra
, dec
and distance
as spherical polar coordinates, where distance
is the
distance from the observer along the line of sight.
Rows are considered to match when their positions in this space are
within error
units of each other.
The units of error
are the same as those of
distance
.
matcher=exact values*='<matched-value>'
matched-value
columns are exactly the same.
These values can be strings, numbers, or anything else.
A blank value never matches, not even with another blank one.
Since the params
parameter holds no values,
it does not have to be specified.
matcher=1d values*='<x>' params='<error>'
x
column
values differ by no more than error
.
matcher=2d values*='<x> <y>' params='<error>'
x
,y
) positions reckoned using
Pythagoras is less than error
.
matcher=Nd values*='<x> <y> ...' params='<error>'
matcher=2d
,
but specify matcher=3d
or whatever and
the corresponding number of entries in the values*
parameters.
matcher=2d_anisotropic values*='<x> <y>' params='<error-in-x> <error-in-y>'
x
,y
)
positions fall within an error ellipse with radii
error-in-x
,error-in-y
of each other.
This kind of match will typically be used for non-'spatial' spaces,
for instance (magnitude,redshift) space, in which the metrics along
different axes are not related to each other.
matcher=Nd_anisotropic values*='<x> <y> ...' params='<error-in-x> <error-in-y> ...'
matcher=2d_anisotropic
,
but specify matcher=3d_anisotropic
or whatever
and the corresponding number of entries in the values*
and params
parameters.
+
" character. The values*
parameters
of the combined matcher should then hold the concatenation of the
values*
entries of the constituent matchers, and the
same for the params
parameter.
So for instance the following can be used:
matcher=sky+1d values*='<ra/degrees> <dec/degrees> <x>' params='<max-error/arcsec> <error>'
ra
, dec
positions are within
max-error
arcseconds of each other along a great circle
(as for matcher=sky
)
and
their x
values differ by no more than error
(as for matcher=1d
).
The tpipe
command allows you to use algebraic
expressions when making row selections and defining new synthetic
columns. They can also be used in defining the quantities to
match against in tmatch2
.
In both cases you are defining an expression which
has a value in each row as a function of the values in the existing
columns in that row.
This is a powerful feature which permits you to manipulate and select
table data in very flexible ways.
The syntax for entering these expressions is explained in this section.
What you write are actually expressions in the Java language, which are compiled into Java bytecode before evaluation. However, this does not mean that you need to be a Java programmer to write them. The syntax is pretty similar to C, but even if you've never programmed in C most simple things, and many complicated ones, are quite intutitive.
The following explanation gives some guidance and examples for writing these expressions. Unfortunately a complete tutorial on writing Java is beyond the scope of this document, but it should provide enough information for even a novice to write useful expressions.
The expressions that you can write are basically any function
of all the column values which apply
to a given row; the function result can then be used in
one of tpipe
's commands,
e.g. to define the per-row value of a new column
(addcol
, replacecol
)
make a row selection
(select
),
and some other places.
If the built-in operators and functions are not sufficient,
or it's unwieldy to express your function in one line of code,
it is possible to add new functions by writing your own classes -
see Section 7.6.3.
Note that since these algebraic expressions often contain spaces, you may need to enclose them in single or double quotes so that they don't get confused with other parts of the command string.
Note: if Java is running in an environment with certain security restrictions (a security manager which does not permit creation of custom class loaders) then algebraic expressions won't work at all. It's not particularly likely that security restrictions will be in place if you are running from the command line though.
To create a useful expression which can be evaluated for each row in a table, you will have to refer to cells in different columns of that row. You can do this in two ways:
There is a special column whose name is "Index" and whose ID is "$0". The value of this is the same as the row number (the first row is 1).
The value of the variables so referenced will be a primitive
(boolean, byte, short, char, int, long, float, double) if the
column contains one of the corresponding types. Otherwise it will
be an Object of the type held by the column, for instance a String.
In practice this means: you can write the name of a column, and it will
evaluate to the numeric (or string) value that that column contains
in each row. You can then use this in normal algebraic expressions
such as "B_MAG - U_MAG
" as you'd expect.
When no special steps are taken, if a null value (blank cell) is encountered in evaluating an expression (usually because one of the columns it relies on has a null value in the row in question) then the result of the expression is also null.
It is possible to exercise more control than this, but it
requires a little bit of care,
because the expressions work in terms of primitive values
(numeric or boolean ones) which don't in general have a defined null
value. The name "null
"
in expressions gives you the java null
reference, but this cannot be matched against a primitive value
or used as the return value of a primitive expression.
For most purposes, the following two tips should enable you to work with null values:
NULL_
"
(use upper case) to the column name or $ID. This
will yield a boolean value which is true if the column contains
a blank, and false otherwise.
NULL
"
(upper case). To return a null value from a non-numeric expression
(e.g. a String column) use the name "null
" (lower case).
Null values are often used in conjunction with the conditional
operator, "? :
"; the expression
test ? tval : fvalreturns the value
tval
if the boolean expression test
evaluates true, or fval
if test
evaluates false.
So for instance the following expression:
Vmag == -99 ? NULL : Vmagcan be used to define a new column which has the same value as the
Vmag
column for most values, but if Vmag
has the "magic" value -99 the new column will contain a blank.
The opposite trick (substituting a blank value with a magic one) can
be done like this:
NULL_Vmag ? -99 : VmagSome more examples are given in Section 7.5.
The operators are pretty much the same as in the C language. The common ones are:
+
(add)
-
(subtract)
*
(multiply)
/
(divide)
%
(modulus)
!
(not)
&&
(and)
||
(or)
^
(exclusive-or)
==
(numeric identity)
!=
(numeric non-identity)
<
(less than)
>
(greater than)
<=
(less than or equal)
>=
(greater than or equal)
(byte)
(numeric -> signed byte)
(short)
(numeric -> 2-byte integer)
(int)
(numeric -> 4-byte integer)
(long)
(numeric -> 8-byte integer)
(float)
(numeric -> 4-type floating point)
(double)
(numeric -> 8-byte floating point)
+
(string concatenation)
[]
(array dereferencing)
?:
(conditional switch)
instanceof
(class membership)
Many functions are available for use within your expressions, covering standard mathematical and trigonometric functions, arithmetic utility functions, type conversions, and some more specialised astronomical ones. You can use them in just the way you'd expect, by using the function name (unlike column names, this is case-sensitive) followed by comma-separated arguments in brackets, so
max(IMAG,JMAG)will give you the larger of the values in the columns IMAG and JMAG, and so on.
The functions available for use by default are listed by class in the following subsections with their arguments and short descriptions.
Functions for conversion of time values between various forms. The forms used are
yyyy-mm-ddThh:mm:ss.s
, where the T
is a literal character (a space character may be used instead).
Based on UTC.
Therefore midday on the 25th of October 2004 is
2004-10-25T12:00:00
in ISO 8601 format,
53303.5 as an MJD value,
2004.81588 as a Julian Epoch and
2004.81726 as a Besselian Epoch.
Currently this implementation cannot be relied upon to better than a millisecond.
isoToMjd( isoDate )
isoDate
argument is
yyyy-mm-ddThh:mm:ss.s
, though some deviations
from this form are permitted:
T
' which separates date from time
can be replaced by a spaceZ
' (which indicates UTC) may be appended
to the time1994-12-21T14:18:23.2
",
"1968-01-14
", and
"2112-05-25 16:45Z
".isoDate
(String): date in ISO 8601 formatisoDate
dateToMjd( year, month, day, hour, min, sec )
year
(integer): year ADmonth
(integer): index of month; January is 1, December is 12day
(integer): day of month (the first day is 1)hour
(integer): hour (0-23)min
(integer): minute (0-59)sec
(floating point): second (0<=sec<60)dateToMjd( year, month, day )
year
(integer): year ADmonth
(integer): index of month; January is 1, December is 12day
(integer): day of month (the first day is 1)mjdToIso( mjd )
yyyy-mm-ddThh:mm:ss
.mjd
(floating point): modified Julian datemjd
mjdToDate( mjd )
yyyy-mm-dd
.mjd
(floating point): modified Julian datemjd
mjdToTime( mjd )
hh:mm:ss
.mjd
(floating point): modified Julian datemjd
formatMjd( mjd, format )
java.text.SimpleDateFormat
class.
The default output corresponds to the string
"yyyy-MM-dd'T'HH:mm:ss
"mjd
(floating point): modified Julian dateformat
(String): formatting pattternmjd
mjdToJulian( mjd )
mjd
(floating point): modified Julian datejulianToMjd( julianEpoch )
julianEpoch
(floating point): Julian epochmjdToBesselian( mjd )
mjd
(floating point): modified Julian datebesselianToMjd( besselianEpoch )
besselianEpoch
(floating point): Besselian epochunixMillisToMjd( unixMillis )
unixMillis
(long integer): milliseconds since the Unix epochmjdToUnixMillis( mjd )
mjd
(floating point): modified Julian dateString manipulation and query functions.
concat( s1, s2 )
s1+s2
, but blank values can sometimes appear as
the string "null
" if you do it like that.s1
(String): first strings2
(String): second strings1
followed by s2
concat( s1, s2, s3 )
s1+s2+s3
, but blank values can sometimes appear as
the string "null
" if you do it like that.s1
(String): first strings2
(String): second strings3
(String): third strings1
followed by s2
followed by s3
concat( s1, s2, s3, s4 )
s1+s2+s3+s4
,
but blank values can sometimes appear as
the string "null
" if you do it like that.s1
(String): first strings2
(String): second strings3
(String): third strings4
(String): fourth strings1
followed by s2
followed by s3
followed by s4
equals( s1, s2 )
s1==s2
,
which can (for technical reasons) return false even if the
strings are the same.s1
(String): first strings2
(String): second stringequalsIgnoreCase( s1, s2 )
s1
(String): first strings2
(String): second stringstartsWith( whole, start )
whole
(String): the string to teststart
(String): the sequence that may appear at the start of
whole
whole
are
the same as start
endsWith( whole, end )
whole
(String): the string to testend
(String): the sequence that may appear at the end of
whole
whole
are
the same as end
contains( whole, sub )
whole
(String): the string to testsub
(String): the sequence that may appear within whole
sub
appears within
whole
length( str )
str
(String): stringstr
matches( str, regex )
str
(String): string to testregex
(String): regular expression stringregex
matches str
anywherematchGroup( str, regex )
str
(String): string to match againstregex
(String): regular expression containing a grouped sectionregex
didn't match str
)replaceFirst( str, regex, replacement )
str
(String): string to manipulateregex
(String): regular expression to match in str
replacement
(String): replacement stringstr
, but with the first match (if any) of
regex
replaced by replacement
replaceAll( str, regex, replacement )
str
(String): string to manipulateregex
(String): regular expression to match in str
replacement
(String): replacement stringstr
, but with all matches of
regex
replaced by replacement
substring( str, startIndex )
str
(String): the input stringstartIndex
(integer): the beginning index, inclusivestr
, omitting the first
startIndex
characterssubstring( str, startIndex, endIndex )
startIndex
and continues to the character at index endIndex-1
Thus the length of the substring is endIndex-startIndex
.str
(String): the input stringstartIndex
(integer): the beginning index, inclusiveendIndex
(integer): the end index, inclusivestr
toUpperCase( str )
str
(String): input stringstr
toLowerCase( str )
str
(String): input stringstr
trim( str )
str
(String): input stringpadWithZeros( value, ndigit )
value
(long integer): numeric value to padndigit
(integer): the number of digits in the resulting stringvalue
with
at least ndigit
charactersStandard mathematical and trigonometric functions.
E
PI
RANDOM
sin( theta )
theta
(floating point): an angle, in radians.cos( theta )
theta
(floating point): an angle, in radians.tan( theta )
theta
(floating point): an angle, in radians.asin( x )
x
(floating point): the value whose arc sine is to be returned.acos( x )
x
(floating point): the value whose arc cosine is to be returned.atan( x )
x
(floating point): the value whose arc tangent is to be returned.exp( x )
x
(floating point): the exponent to raise e to.log10( x )
x
(floating point): argumentln( x )
x
(floating point): argumentsqrt( x )
x
(floating point): a value.x
.
If the argument is NaN or less than zero, the result is NaN.atan2( y, x )
x
,y
)
to polar (r
,theta
).
This method computes the phase
theta
by computing an arc tangent
of y/x
in the range of -pi to pi.y
(floating point): the ordinate coordinatex
(floating point): the abscissa coordinatetheta
component (radians) of the point
(r
,theta
)
in polar coordinates that corresponds to the point
(x
,y
) in Cartesian coordinates.pow( a, b )
a
(floating point): the base.b
(floating point): the exponent.ab
.Functions for formatting numeric values.
formatDecimal( value, dp )
value
(floating point): value to formatdp
(integer): number of decimal places (digits after the decmal point)formatDecimal( value, format )
format
string is as defined by Java's
java.text.DecimalFormat
class.value
(floating point): value to formatformat
(String): format specifierFunctions for angle transformations and manipulations. In particular, methods for translating between radians and HH:MM:SS.S or DDD:MM:SS.S type sexagesimal representations are provided.
DEGREE
HOUR
ARC_MINUTE
ARC_SECOND
radiansToDms( rad )
rad
(floating point): angle in radiansrad
radiansToDms( rad, secFig )
rad
(floating point): angle in radianssecFig
(integer): number of decimal places in the seconds fieldrad
radiansToHms( rad )
rad
(floating point): angle in radiansrad
radiansToHms( rad, secFig )
rad
(floating point): angle in radianssecFig
(integer): number of decimal places in the seconds fieldrad
dmsToRadians( dms )
dm[s]
, or some others.
Additional spaces and leading +/- are permitted.dms
(String): formatted DMS stringdms
hmsToRadians( hms )
hm[s]
, or some others.
Additional spaces and leading +/- are permitted.hms
(String): formatted HMS stringhms
dmsToRadians( deg, min, sec )
In conversions of this type, one has to be careful to get the
sign right in converting angles which are between 0 and -1 degrees.
This routine uses the sign bit of the deg
argument,
taking care to distinguish between +0 and -0 (their internal
representations are different for floating point values).
It is illegal for the min
or sec
arguments
to be negative.
deg
(floating point): degrees part of anglemin
(floating point): minutes part of anglesec
(floating point): seconds part of anglehmsToRadians( hour, min, sec )
In conversions of this type, one has to be careful to get the
sign right in converting angles which are between 0 and -1 hours.
This routine uses the sign bit of the hour
argument,
taking care to distinguish between +0 and -0 (their internal
representations are different for floating point values).
hour
(floating point): degrees part of anglemin
(floating point): minutes part of anglesec
(floating point): seconds part of angleskyDistance( ra1, dec1, ra2, dec2 )
ra1
(floating point): right ascension of point 1 in radiansdec1
(floating point): declination of point 1 in radiansra2
(floating point): right ascension of point 2 in radiansdec2
(floating point): declination of point 2 in radiansskyDistanceDegrees( ra1, dec1, ra2, dec2 )
ra1
(floating point): right ascension of point 1 in degreesdec1
(floating point): declination of point 1 in degreesra2
(floating point): right ascension of point 2 in degreesdec2
(floating point): declination of point 2 in degreeshoursToRadians( hours )
hours
(floating point): angle in hoursdegreesToRadians( deg )
deg
(floating point): angle in degreesradiansToDegrees( rad )
rad
(floating point): angle in radiansraFK4toFK5( raFK4, decFK4 )
raFK4
(floating point): right ascension in B1950.0 FK4 system (radians)decFK4
(floating point): declination in B1950.0 FK4 system (radians)decFK4toFK5( raFK4, decFK4 )
raFK4
(floating point): right ascension in B1950.0 FK4 system (radians)decFK4
(floating point): declination in B1950.0 FK4 system (radians)raFK5toFK4( raFK5, decFK5 )
raFK5
(floating point): right ascension in J2000.0 FK5 system (radians)decFK5
(floating point): declination in J2000.0 FK5 system (radians)decFK5toFK4( raFK5, decFK5 )
raFK5
(floating point): right ascension in J2000.0 FK5 system (radians)decFK5
(floating point): declination in J2000.0 FK5 system (radians)raFK4toFK5( raFK4, decFK4, bepoch )
bepoch
parameter is the epoch at which the position in
the FK4 frame was determined.raFK4
(floating point): right ascension in B1950.0 FK4 system (radians)decFK4
(floating point): declination in B1950.0 FK4 system (radians)bepoch
(floating point): Besselian epochdecFK4toFK5( raFK4, decFK4, bepoch )
bepoch
parameter is the epoch at which the position in
the FK4 frame was determined.raFK4
(floating point): right ascension in B1950.0 FK4 system (radians)decFK4
(floating point): declination in B1950.0 FK4 system (radians)bepoch
(floating point): Besselian epochraFK5toFK4( raFK5, decFK5, bepoch )
raFK5
(floating point): right ascension in J2000.0 FK5 system (radians)decFK5
(floating point): declination in J2000.0 FK5 system (radians)bepoch
(floating point): Besselian epochdecFK5toFK4( raFK5, decFK5, bepoch )
raFK5
(floating point): right ascension in J2000.0 FK5 system (radians)decFK5
(floating point): declination in J2000.0 FK5 system (radians)bepoch
(floating point): Besselian epochFunctions for converting between strings and numeric values.
toString( value )
value
(floating point): numeric valuevalue
parseByte( str )
str
(String): string containing numeric representationstr
parseShort( str )
str
(String): string containing numeric representationstr
parseInt( str )
str
(String): string containing numeric representationstr
parseLong( str )
str
(String): string containing numeric representationstr
parseFloat( str )
str
(String): string containing numeric representationstr
parseDouble( str )
str
(String): string containing numeric representationstr
toByte( value )
value
(floating point): numeric value for conversionvalue
converted to type bytetoShort( value )
value
(floating point): numeric value for conversionvalue
converted to type shorttoInteger( value )
value
(floating point): numeric value for conversionvalue
converted to type inttoLong( value )
value
(floating point): numeric value for conversionvalue
converted to type longtoFloat( value )
value
(floating point): numeric value for conversionvalue
converted to type floattoDouble( value )
value
(floating point): numeric value for conversionvalue
converted to type doubleStandard arithmetic functions including things like rounding, sign manipulation, and maximum/minimum functions.
roundUp( x )
x
(floating point): a value.x
rounded uproundDown( x )
x
(floating point): a valuex
rounded downround( x )
x
(floating point): a floating point value.x
rounded to the nearest integerroundDecimal( x, dp )
float
(32-bit floating point value),
so this is only suitable for relatively low-precision values.
It's intended for truncating the number of apparent significant
figures represented by a value which you know has been obtained
by combining other values of limited precision.
For more control, see the functions in the Formats
class.x
(floating point): a floating point valuedp
(integer): number of decimal places (digits after the decimal point)
to retainx
but with a
limited apparent precisionabs( x )
x
(integer): the argument whose absolute value is to be determinedabs( x )
x
(floating point): the argument whose absolute value is to be determinedmax( a, b )
a
(integer): an argument.b
(integer): another argument.a
and b
.max( a, b )
a
(floating point): an argument.b
(floating point): another argument.a
and b
.min( a, b )
a
(integer): an argument.b
(integer): another argument.a
and b
.min( a, b )
a
(floating point): an argument.b
(floating point): another argument.a
and b
.Here are some examples for defining new columns;
the expressions below could appear as the <expr>
in a
tpipe
addcol
or sortexpr
command).
(first + second) * 0.5
sqrt(variance)
radiansToDegrees(DEC_radians) degreesToRadians(RA_degrees)
parseInt($12) parseDouble(ident)
toString(index)
toShort(obs_type) toDouble(range)or
(short) obs_type (double) range
hmsToRadians(RA1950) dmsToRadians(decDeg,decMin,decSec)
radiansToDms($3) radiansToHms(RA,2)
min(1000, max(value, 0))
jmag == 9999 ? NULL : jmag
NULL_jmag ? 9999 : jmag
psfCounts[2]
tpipe
select
command)
RA > 100 && RA < 120 && Dec > 75 && Dec < 85
$2*$2 + $3*$3 < 1 skyDistance(ra0,dec0,degreesToRadians(RA),degreesToRadians(DEC))<15*ARC_MINUTE
index <= 100(though you could use
tpipe cmd='head 100'
instead)index % 10 == 0(though you could use
tpipe cmd='every 10'
instead)equals(SECTOR, "ZZ9 Plural Z Alpha") equalsIgnoreCase(SECTOR, "zz9 plural z alpha") startsWith(SECTOR, "ZZ") contains(ph_qual, "U")
matches(SECTOR, "[XYZ] Alpha")
! NULL_ellipticity
This section contains some notes on getting the most out of the algebraic expressions facility. If you're not a Java programmer, some of the following may be a bit daunting - read on at your own risk!
This note provides a bit more detail for Java programmers on what is going on here; it describes how the use of functions in STILTS algebraic expressions relates to normal Java code.
The expressions which you write are compiled to Java bytecode
when you enter them (if there is a 'compilation error' it will be
reported straight away). The functions listed in the previous subsections
are all the public static
methods of the classes which
are made available by default. The classes listed are all in the
package uk.ac.starlink.ttools.func
.
However, the public static methods are all imported into an anonymous
namespace for bytecode compilation, so that you write
(sqrt(x,y)
and not Maths.sqrt(x,y)
.
The same happens to other classes that are imported (which can be
in any package or none) - their public
static methods all go into the anonymous namespace. Thus, method
name clashes are a possibility.
This cleverness is all made possible by the rather wonderful JEL.
There is another category of functions which can be used apart from those listed in Section 7.4. These are called, in Java/object-oriented parlance, "instance methods" and represent functions that can be executed on an object.
It is possible to invoke any of its public
instance methods on any object
(though not on primitive values - numeric and boolean ones).
The syntax is that you place a "." followed by the method invocation
after the object you want to invoke the method on,
hence NAME.substring(3)
instead of substring(NAME,3)
.
If you know what you're doing, feel free to go ahead and do this.
However, most of the instance methods you're likely to want to use
have equivalents in the normal functions listed in the previous section,
so unless you're a Java programmer or feeling adventurous,
you may be best off ignoring this feature.
The functions provided by default for use with algebraic expressions, while powerful, may not provide all the operations you need. For this reason, it is possible to write your own extensions to the expression language. In this way you can specify abritrarily complicated functions. Note however that this will only allow you to define new columns or subsets where each cell is a function only of the other cells in the same row - it will not allow values in one row to be functions of values in another.
In order to do this, you have to write and compile a (probably short) program in the Java language. A full discussion of how to go about this is beyond the scope of this document, so if you are new to Java and/or programming you may need to find a friendly local programmer to assist (or mail the author). The following explanation is aimed at Java programmers, but may not be incomprehensible to non-specialists.
The steps you need to follow are:
jel.classes
system property (colon-separated if there are several)
as described in Section 3.3
Any public static methods defined in the classes thus specified will then be available for use. They should be defined to take and return the relevant primitive or Object types for the function required. For instance a class written as follows would define a three-value average:
public class AuxFuncs { public static double average3( double x, double y, double z ) { return ( x + y + z ) / 3.0; } }and the command
stilts tpipe cmd='addcol AVERAGE "average3($1,$2,$3)"'would add a new column called AVERAGE giving the average of the first three existing columns. Exactly how you would build this is dependent on your system, but it might involve doing something like the following:
AuxFuncs.java
containing the above codejavac AuxFuncs.java
"tpipe
using the flags
"stilts -classpath . -Djel.classes=AuxFuncs tpipe
"This appendix provides the reference documentation for the commands in the package. For each one a description of its purpose, a list of its command-line arguments, and some examples are given.
calc
: Calculator
calc
is a very simple utility for evaluating expressions.
It uses the same expression evaluator as is used in tpipe
and the other generic table tasks for things like creating new columns,
so it can be used as a quick test to see what expressions work,
or in order to evaluate expressions using the various algebraic
functions documented in Section 7.4.
Since no table is involved, you can't refer to column names in
the expressions.
It takes one parameter, the expression to evaluate, and writes the
result to the screen.
The usage of calc
is
stilts <stilts-flags> calc [expression=]<expr>If you don't have the
stilts
script installed,
write "java -jar stilts.jar
" instead of
"stilts
" - see Section 3.
The available <stilts-flags>
are listed
in Section 2.1.
Parameter values are assigned on the command line as explained in Section 2.3. They are as follows:
expression = <expr>
Here are some examples of using calc
:
stilts calc 1+2
stilts calc 'isoToMjd("2005-12-25T00:00:00")'
tcat
: Table Concatenater
tcat
is a tool for concatenating tables one after the other.
If you have two tables T1 and T2 which contain similar columns, and you
want to treat them as a single table, you can use tcat
to produce a new table whose metadata (row headings etc) comes from T1
and whose data consists of all the rows of T1 followed by all the rows
of T2. This will only work if the columns of the two tables to be
joined have the same or compatible types in the same order;
if they do not, you must use the icmd
parameters to
preprocess the input tables so that the column sequences are compatible.
In the current release of STILTS, tcat
is rather
rudimentary: you can only join two tables at once, you must arrange
for the columns to be in the right order, and you may end up with
an unhelpful error if the columns in matching positions are not of
compatible types. Behaviour will be improved in a future release.
The usage of tcat
is
stilts <stilts-flags> tcat ifmt1=<in-format> ifmt2=<in-format> icmd1=<cmds> icmd2=<cmds> ocmd=<cmds> omode=<out-mode> <mode-args> out=<out-table> ofmt=<out-format> [in1=]<table1> [in2=]<table2>If you don't have the
stilts
script installed,
write "java -jar stilts.jar
" instead of
"stilts
" - see Section 3.
The available <stilts-flags>
are listed
in Section 2.1.
Parameter values are assigned on the command line as explained in Section 2.3. They are as follows:
icmd1 = <cmds>
icmd2 = <cmds>
ifmt1 = <in-format>
(auto)
(the default),
then an attempt will be
made to detect the format of the table automatically.
This cannot always be done correctly however, in which case
the program will exit with an error explaining which
formats were attempted.
[Default: (auto)
]ifmt2 = <in-format>
(auto)
(the default),
then an attempt will be
made to detect the format of the table automatically.
This cannot always be done correctly however, in which case
the program will exit with an error explaining which
formats were attempted.
[Default: (auto)
]in1 = <table1>
ifmt1
parameter.
in2 = <table2>
ifmt2
parameter.
ocmd = <cmds>
ofmt = <out-format>
(auto)
"
(the default),
then the output filename will be
examined to try to guess what sort of file is required
usually by looking at the extension.
If it's not obvious from the filename what output format is
intended, an error will result.
This parameter must only be given if
omode
has its default value of "out
".
[Default: (auto)
]omode = <out-mode> <mode-args>
out
, which means that
the result will be written as a new table to disk or elsewhere,
as determined by the out
and ofmt
parameters.
However, there are other possibilities, which correspond
to uses to which a table can be put other than outputting it,
such as displaying metadata, calculating statistics,
or populating a table in an SQL database.
For some values of this parameter, additional parameters
(<mode-args>
)
are required to determine the exact behaviour.
Possible values are out
, meta
, stats
, count
, cgi
, discard
, topcat
, plastic
and tosql
.
Use the help=omode
flag
or see Section 5.4 for more information.
[Default: out
]out = <out-table>
omode
has its default value of "out
".
[Default: -
]Here are some examples of tcopy
:
stilts tcat obs1.fits obs2.fits out=combined.fits
stilts tcat ifmt1=ascii in1=obs1.txt ifmt2=ascii in2=ob2.txt omode=stats
stilts tcat in1=survey.vot.gz ifmt2=csv in2=more_data.csv icmd1='addskycoords fk5 galactic RA2000 DEC2000 GLON GLAT' \ icmd1='keepcols "OBJ_ID GLON GLAT"' \ icmd2='keepcols "ident gal_long gal_lat"' \ omode=topcat
ifmt1
parameter is required since
VOTables can be detected automatically), and the other is a
comma-separated-values file (for which the ifmt2=csv
parameter must be given).
In the second place, the column structure of the two tables may be
quite different. By pre-processing the two tables using the
icmd1
& icmd2
parameters, we produce
in each case an input table which consists of three columns of
compatible types and meanings: an integer identifier and floating point
galactic longitude and latitude coordinates.
The second table contains such columns to start with,
but the first table requires an initial step to convert
FK5 J2000.0 coordinates to galactic ones.
tcat
joins the two doctored tables together, to produce
a table which contains only these three columns, with all the rows
from both input tables, and sends the result directly
to a new or running instance of TOPCAT.
tcopy
: Table Format Converter
tcopy
is a table copying tool.
It simply copies a table from one place to another, but since
you can specify the input and output formats as desired, it works
as a converter from any of the supported
input formats
to any of the supported
output formats.
tcopy
is just a stripped-down version of
tpipe
- it doesn't do anything
that tpipe
can't, but the usage is slightly
simplified.
It is provided as a drop-in replacement for the old
tablecopy
(uk.ac.starlink.table.TableCopy
)
tool which was supplied with earlier versions of STIL and TOPCAT -
it has the same arguments and behaviour as tablecopy
,
but is implemented somewhat differently
and will in some cases be more efficient.
The usage of tcopy
is
stilts <stilts-flags> tcopy ifmt=<in-format> ofmt=<out-format> [in=]<in-table> [out=]<out-table>If you don't have the
stilts
script installed,
write "java -jar stilts.jar
" instead of
"stilts
" - see Section 3.
The available <stilts-flags>
are listed
in Section 2.1.
Parameter values are assigned on the command line as explained in Section 2.3. They are as follows:
ifmt = <in-format>
(auto)
(the default),
then an attempt will be
made to detect the format of the table automatically.
This cannot always be done correctly however, in which case
the program will exit with an error explaining which
formats were attempted.
[Default: (auto)
]in = <in-table>
ifmt
parameter.
ofmt = <out-format>
(auto)
"
(the default),
then the output filename will be
examined to try to guess what sort of file is required
usually by looking at the extension.
If it's not obvious from the filename what output format is
intended, an error will result.
[Default: (auto)
]out = <out-table>
-
]Here are some examples of tcopy
in use:
stilts tcopy stars.fits stars.xml
stars.xml
filename is examined to make a guess at the kind of output to write:
the .xml
ending is taken to mean a TABLEDATA-encoded
VOTable.
stilts tcopy stars.fits stars.xml ifmt=fits ofmt=votable
stilts tcopy ofmt=text http://remote.host/data/vizer.xml.gz#4 -
#4
at the end of the URL
indicates that the data from the fifth TABLE
element
in the remote document are to be used. The gzip compression of
the table is taken care of automatically.
stilts tcopy ifmt=csv ofmt=latex spec.csv
stilts -classpath /usr/local/jars/pg73jdbc3.jar \ -Djdbc.drivers=org.postgresql.Driver \ tcopy "jdbc:postgresql://localhost/imsim#SELECT ra, dec, Imag FROM dqc" \ ofmt=fits wfslist.cat
jdbc.drivers
system property.
As you can see, using SQL from Java is a bit fiddly,
and there are other ways to perform this
setup than on the command line - see Section 3.4
and tpipe
's
omode=tosql
output mode.
tcube
: N-dimensional Histogram Calculator
tcube
constructs an N-dimensional histogram, or density map,
from N columns of an input table, and writes it out as an
N-dimensional data cube. The parameters you supply define which N
numeric columns of the input table you want to use and the dimensions
(bounds and pixel sizes) of the output grid.
Each table row then defines a point in N-dimensional space.
The program goes through each row, and if the point that row
defines falls within the bounds of the output grid you have defined,
increments the value associated with the corresponding pixel.
The resulting N-dimensional array, whose pixel values represent a
count of the rows associated with that region of the N-dimensional space,
is then written out as a FITS file.
In one dimension, this gives you a normal histogram of a given variable.
In two dimensions it might typically be used to plot the density on
the sky of objects from a catalogue.
As with some of the other generic table commands,
you can perform extensive pre-processing on the input table by
use of the icmd
parameter before the actual cube
counts are calculated.
The usage of tcube
is
stilts <stilts-flags> tcube ifmt=<in-format> istream=true|false icmd=<cmds> cols=<col-id> ... bounds=[<lo>]:[<hi>] ... binsizes=<size> ... nbins=<num> ... out=<out-file> bitpix=8|16|32|64 [in=]<table>If you don't have the
stilts
script installed,
write "java -jar stilts.jar
" instead of
"stilts
" - see Section 3.
The available <stilts-flags>
are listed
in Section 2.1.
Parameter values are assigned on the command line as explained in Section 2.3. They are as follows:
binsizes = <size> ...
nbins
parameter
must be supplied.
bitpix = 8|16|32|64
bounds = [<lo>]:[<hi>] ...
cols = <col-id> ...
<col-id>
elements,
separated by spaces, should be given.
Each one represents a column in the table, using either its
name or index.
The number of columns listed in the value of this
parameter defines the dimensionality of the output
data cube.
icmd = <cmds>
ifmt = <in-format>
(auto)
(the default),
then an attempt will be
made to detect the format of the table automatically.
This cannot always be done correctly however, in which case
the program will exit with an error explaining which
formats were attempted.
[Default: (auto)
]in = <table>
ifmt
parameter.
istream = true|false
in
table
will be read as a stream.
It is necessary to give the
ifmt
parameter
in this case.
Depending on the required operations and processing mode,
this may cause the read to fail (sometimes it is necessary
to read the input table more than once).
It is not normally necessary to set this flag;
in most cases the data will be streamed automatically
if that is the best thing to do.
However it can sometimes result in less resource usage when
processing large files in certain formats (such as VOTable).
[Default: false
]nbins = <num> ...
binsizes
parameter
must be supplied.
out = <out-file>
-
]
stilts tcube in=2QZ_6QZ_pubcat.fits out=ccm.fits \ cols='Bj_R U_Bj Bj' binsizes='0.05 0.05 0.5' bounds='-2:1 -3:2 :'
stilts tcube in=iras_psc.vot out=iras_psc_map.fits \ icmd='addskycoords fk5 galactic ra dec glat glon' \ cols='glat glon' nbins='400 200'
addskycoords
filter is used to preprocess the data before the cube generation
step (see Section 5.1).
tmatch2
: Pair Crossmatcher
tmatch2
is an efficient and highly configurable
tool for crossmatching pairs of tables.
It can match rows between tables on the basis of their relative position
in the sky, or alternatively using many other criteria such as
separation in some isotropic or anisotropic Cartesian space,
identity of a key value, or some combination of these;
the full range of match criteria is discussed in Section 6.1.
You can choose whether you want to identify all the matches or
only the closest,
and what form the output table takes, for instance matched rows only,
or all rows from one or both tables, or only the unmatched rows.
The usage of tmatch2
is
stilts <stilts-flags> tmatch2 ifmt1=<in-format> ifmt2=<in-format> icmd1=<cmds> icmd2=<cmds> matcher=<matcher-name> values1=<expr-list> values2=<expr-list> params=<match-params> join=1and2|1or2|all1|all2|1not2|2not1|1xor2 find=best|all ocmd=<cmds> omode=<out-mode> <mode-args> out=<out-table> ofmt=<out-format> [in1=]<table1> [in2=]<table2>If you don't have the
stilts
script installed,
write "java -jar stilts.jar
" instead of
"stilts
" - see Section 3.
The available <stilts-flags>
are listed
in Section 2.1.
Parameter values are assigned on the command line as explained in Section 2.3. They are as follows:
find = best|all
best
is selected, then only the best match
between the two tables will be retained; in this case
the data from a row of either input table will appear in
at most one row of the output table.
If all
is selected, then all pairs of rows
from the two input tables which match the input criteria
will be represented in the output table.
[Default: best
]icmd1 = <cmds>
icmd2 = <cmds>
ifmt1 = <in-format>
(auto)
(the default),
then an attempt will be
made to detect the format of the table automatically.
This cannot always be done correctly however, in which case
the program will exit with an error explaining which
formats were attempted.
[Default: (auto)
]ifmt2 = <in-format>
(auto)
(the default),
then an attempt will be
made to detect the format of the table automatically.
This cannot always be done correctly however, in which case
the program will exit with an error explaining which
formats were attempted.
[Default: (auto)
]in1 = <table1>
ifmt1
parameter.
in2 = <table2>
ifmt2
parameter.
join = 1and2|1or2|all1|all2|1not2|2not1|1xor2
1and2
: An output row for each row represented in both input tables1or2
: An output row for each row represented in either or both of the input tablesall1
: An output row for each matched or unmatched row in table 1all2
: An output row for each matched or unmatched row in table 21not2
: An output row only for rows which appear in the first table but are not matched in the second table2not1
: An output row only for rows which appear in the second table but are not matched in the first table1xor2
: An output row only for rows represented in one of the input tables but not the other one1and2
]matcher = <matcher-name>
params
,
values1
and values2
parameters.
[Default: sky
]ocmd = <cmds>
ofmt = <out-format>
(auto)
"
(the default),
then the output filename will be
examined to try to guess what sort of file is required
usually by looking at the extension.
If it's not obvious from the filename what output format is
intended, an error will result.
This parameter must only be given if
omode
has its default value of "out
".
[Default: (auto)
]omode = <out-mode> <mode-args>
out
, which means that
the result will be written as a new table to disk or elsewhere,
as determined by the out
and ofmt
parameters.
However, there are other possibilities, which correspond
to uses to which a table can be put other than outputting it,
such as displaying metadata, calculating statistics,
or populating a table in an SQL database.
For some values of this parameter, additional parameters
(<mode-args>
)
are required to determine the exact behaviour.
Possible values are out
, meta
, stats
, count
, cgi
, discard
, topcat
, plastic
and tosql
.
Use the help=omode
flag
or see Section 5.4 for more information.
[Default: out
]out = <out-table>
omode
has its default value of "out
".
[Default: -
]params = <match-params>
matcher
parameter.
If it contains multiple values, they must be separated by spaces;
values which contain a space can be 'quoted' or "quoted".
values1 = <expr-list>
matcher
.
Depending on the kind of match, the number and type of
the values required will be different.
Multiple values should be separated by whitespace;
if whitespace occurs within a single value it must be
'quoted' or "quoted".
Elements of the expression list are commonly just column
names, but may be algebraic expressions calculated from
zero or more columns as explained in Section 7.
values2 = <expr-list>
matcher
.
Depending on the kind of match, the number and type of
the values required will be different.
Multiple values should be separated by whitespace;
if whitespace occurs within a single value it must be
'quoted' or "quoted".
Elements of the expression list are commonly just column
names, but may be algebraic expressions calculated from
zero or more columns as explained in Section 7.
Here are some examples of using tmatch2
stilts tmatch2 in1=obs_v.xml in2=obs_i.xml out=obs_iv.xml \ matcher=sky values1="ra dec" values2="ra dec" params="2"
stilts tmatch2 survey.fits ifmt2=csv mycat.csv \ icmd1='addskycoords fk4 fk5 RA1950 DEC1950 RA2000 DEC2000' \ matcher=skyerr \ params=10 values1="RA2000 DEC2000 POS_ERR" values2="RA DEC 0" \ join=2not1 omode=count
skyerr
matcher is
used, which takes account of this; the third entry in the
values1
parameter is the POS_ERR column (in arcsec).
Since the second input table has no positional uncertainty information,
0 is used as the third entry in values2
.
The params
still has to contain a value which gives the
maximum error for matching (i.e. >= the largest value in the
POS_ERR column).
The join type is 2not1
, which means the output table
will only contain those entries which are in the second input table
but not in the first one.
The output table is not stored, but the number of rows it contains
(the number of objects represented in the CSV file but not the survey)
is written to the screen.
stilts tmatch2 ifmt1=ascii ifmt2=ascii int1=cat1.txt in2=cat2.txt \ matcher=2d values1="X Y" values2="X Y" params="5" join=1and2 \ ocmd='addcol XDIFF X_1-X_2; addcol Y_1-Y_2' \ ocmd'keepcols "XDIFF YDIFF"' omode=stats
keepcols
filter then throws all the other columns away,
retaining only these difference columns.
The final two-column table is not stored anywhere,
but (omode=stats
)
statistics including mean and standard deviation
are calculated on its columns and displayed to the screen.
Having done all this, you can examine the average X and Y differences
between the two input tables for matched rows, and if they differ
significantly from zero, you can conclude that there is a systematic
error between the positions in the two input files.
stilts tmatch2 in1=mgc.fits in2=6dfgs.xml join=1and2 find=all \ matcher=sky+1d params='3 0.5' \ values1='ra dec bmag' values2='RA2000 DEC2000 B_MAG" \ out=pairs.fits
sky
and 1d
match criteria. This means that the only
rows which match are those which are
both within 3 arcsec of each other on the sky
and and within 0.5 blue magnitudes.
Note that for both the params
and the
values1
and values2
parameters,
the items for the sky
matcher (RA and DEC)
are listed first,
followed by those for the 1d
matcher (in this case,
blue magnitude).
tpipe
: Generic Table Pipeline Utility
tpipe
performs all kinds of general purpose manipulations
which take one table as input.
It is extremely flexible, and can do the following things
amongst others:
The basic operation of tpipe
is that it reads an
input table, performs zero or more processing steps on it,
and then does something with the output. There are therefore
three classes of things you need to tell it when it runs:
in
, ifmt
and
istream
parameters.
cmd
parameters, or the name of a file
containing the steps using the script
parameter.
The steps that you can perform are described in
Section 5.1.
omode
parameter.
By default, omode=out
,
in which case the table is written to a new table file in a format
determined by ofmt
. However, you can do other things
with the result such as
calculate the per-column statistics (omode=stats
),
view only the table and column metadata (omode=meta
),
display it directly in TOPCAT (omode=topcat
) etc.
The parameters mentioned above are listed in detail in the next section.
The usage of tpipe
is
stilts <stilts-flags> tpipe ifmt=<in-format> istream=true|false script=<script-file> cmd=<cmds> omode=<out-mode> <mode-args> out=<out-table> ofmt=<out-format> [in=]<table>If you don't have the
stilts
script installed,
write "java -jar stilts.jar
" instead of
"stilts
" - see Section 3.
The available <stilts-flags>
are listed
in Section 2.1.
Parameter values are assigned on the command line as explained in Section 2.3. They are as follows:
cmd = <cmds>
script
and
cmd
flags should not be mixed in the same invocation.
ifmt = <in-format>
(auto)
(the default),
then an attempt will be
made to detect the format of the table automatically.
This cannot always be done correctly however, in which case
the program will exit with an error explaining which
formats were attempted.
[Default: (auto)
]in = <table>
ifmt
parameter.
istream = true|false
in
table
will be read as a stream.
It is necessary to give the
ifmt
parameter
in this case.
Depending on the required operations and processing mode,
this may cause the read to fail (sometimes it is necessary
to read the input table more than once).
It is not normally necessary to set this flag;
in most cases the data will be streamed automatically
if that is the best thing to do.
However it can sometimes result in less resource usage when
processing large files in certain formats (such as VOTable).
[Default: false
]ofmt = <out-format>
(auto)
"
(the default),
then the output filename will be
examined to try to guess what sort of file is required
usually by looking at the extension.
If it's not obvious from the filename what output format is
intended, an error will result.
This parameter must only be given if
omode
has its default value of "out
".
[Default: (auto)
]omode = <out-mode> <mode-args>
out
, which means that
the result will be written as a new table to disk or elsewhere,
as determined by the out
and ofmt
parameters.
However, there are other possibilities, which correspond
to uses to which a table can be put other than outputting it,
such as displaying metadata, calculating statistics,
or populating a table in an SQL database.
For some values of this parameter, additional parameters
(<mode-args>
)
are required to determine the exact behaviour.
Possible values are out
, meta
, stats
, count
, cgi
, discard
, topcat
, plastic
and tosql
.
Use the help=omode
flag
or see Section 5.4 for more information.
[Default: out
]out = <out-table>
omode
has its default value of "out
".
[Default: -
]script = <script-file>
script
and
cmd
flags should not be mixed in the same invocation.
Here are some examples of tpipe
in use with explanations
of what's going on. For simplicity these examples assume that you have the
stilts
script installed and are using a Unix-like shell;
see Section 3 for an explanation of how to invoke the command
if you just have the Java classes.
stilts tpipe cat.fits
omode=out
is assumed,
and output is to standard output in text
format.
stilts tpipe cmd='head 5' cat.fits.gz
stilts tpipe ifmt=csv xxx.csv \ cmd='keepcols "index ra dec"' \ omode=out ofmt=fits xxx.fits
cmd
argument: the outer quotes
are so that the argument of the cmd
parameter itself
(keepcols "index ra dec"
)
is not split up by spaces (to protect it from the shell),
and the inner quotes are to keep the
colid-list
argument of the
keepcols
command together.
stilts tpipe ifmt=votable \ cmd='addcol IV_SUM "(IMAG+VMAG)"' \ cmd='addcol IV_DIFF "(IMAG-VMAG)"' \ cmd='delcols "IMAG VMAG"' \ omode=out ofmt=votable \ < tab1.vot \ > tab2.vot
in
nor out
parameters
have been specified, the input and output are actually byte
streams on standard input and standard output of the
tpipe
command in this case.
The processing steps first add a column representing the sum,
then add a column representing the difference, then delete the
original columns.
stilts tpipe cmd='addskycoords -inunit sex fk5 gal \ RA2000 DEC2000 GAL_LONG GAL_LAT' \ 6dfgs.fits 6dfgs+gal.fits
stilts -disk tpipe 2dfgrs_ngp.fits \ cmd='keepcols "SEQNUM AREA ECCENT"' \ cmd='sort -down AREA' \ cmd='head 20'
-disk
flag is supplied, which means that
temporary disk files rather than memory
will be used for caching table data.
stilts tpipe 2dfgrs_ngp.fits \ cmd='keepcols "SEQNUM AREA ECCENT"' \ cmd='sorthead -down 20 AREA'
sorthead
filter is
in most cases faster and cheaper on memory (only 20 rows ever have
to be stored in this case), so this is generally a better approach
than combining the sort
and head
filters.
stilts tpipe omode=meta http://archive.org/data/survey.vot.Z
stilts tpipe in=survey.fits cmd='select "skyDistance(hmsToRadians(RA),dmsToRadians(DEC), \ hmsToRadians(2,28,11),dmsToRadians(-6,49,45) \ < 5 * ARC_MINUTE"' \ omode=count
skyDistance
function is an expression which
calculates the distance between the position specified in a row
(as given by its RA and DEC columns) and a given point on the sky
(here, 02:28:11,-06:49:45).
Since skyDistance
's arguments and return value are in
radians, some conversions are required: the RA and DEC columns
are sexagesimal strings which are converted using the
hmsToRadians
and dmsToRadians
functions
respectively. Different versions of these functions (ones which take
numeric arguments) are used to convert the coordinates of the fixed
point to radians.
The result is compared to a multiple of the
ARC_MINUTE
constant, which is the size of an arcminute
in radians. Any rows of the input table for which this comparison
is true are included in the output.
An alternative function, skyDistanceDegrees
which works
in degrees, is also available.
The functions and constants used here are described in detail
in Section 7.4.5.
stilts tpipe ifmt=ascii survey.txt \ cmd='select "OBJTYPE == 3 && Z > 0.15"' \ cmd='keepcols "IMAG JMAG KMAG"' \ omode=stats
stilts -classpath lib/drivers/mysql-connector-java.jar \ -Djdbc.drivers=com.mysql.jdbc.Driver \ tpipe in=x.fits cmd="explodeall" omode=tosql \ protocol=mysql host=localhost database=ASTRO1 newtable=TABLEX \ user=mbt
jdbc.drivers
system property is set to the JDBC driver
class name. The output will be written as a new table named TABLEX
in the MySQL database called ASTRO1 on a MySQL server on the
local host. The password, if required, will be prompted for,
as would any of the other required parameters if they had not been
given on the command line.
Any existing table in ASTRO1 with the name TABLEX is overwritten.
The only processing done here is by the explodeall
command,
which takes any columns which have fixed-size array values and
replaces them in the output with multiple scalar columns.
java -classpath stilts.jar:lib/drivers/mysql-connector-java.jar -Djdbc.drivers=com.mysql.jdbc.Driver \ uk.ac.starlink.ttools.Stilts \ tpipe in=x.fits \ cmd=explodeall \ omode=out \ out="jdbc:mysql://localhost/ASTRO1?user=mbt#TABLEX"
stilts
script to do it. Note that you cannot use java's
-jar
flag in this case, because doing it like that
would not permit access to the additional classes that contain
the JDBC driver.
In the second place we use omode=out
rather than
omode=tosql
. For this we need to supply an out
value which encodes the information about the SQL connection and
table in a special URL-like format. As you can see, this is a bit
arcane, which is why the omode=tosql
mode can be a help.
stilts tpipe USNOB.FITS cmd='every 1000000' omode=stats
votcopy
: VOTable Encoding Translator
The VOTable standard provides for three basic encodings
of the actual data within each table: TABLEDATA, BINARY and FITS.
TABLEDATA is a pure-XML encoding, which is relatively easy for humans
to read and write.
However, it is verbose and not very efficient for transmission
and processing,
for which reason the more compact BINARY format has been defined.
FITS format shares the advantages of BINARY, but is more likely to
be used where a VOTable is providing metadata 'decoration' for
an existing FITS table.
In addition, the BINARY and FITS encodings may carry their data
either inline
(as the base64-encoded text content of a STREAM
element)
or externally
(referenced by a STREAM
element's href
attribute).
These different formats have their different advantages and disadvantages. Since, to some extent, programmers are humans too, much existing VOTable software deals in TABLEDATA format even though it may not be the most efficient way to proceed. Conversely, you might wish to examine the contents of a BINARY-encoded table without use of any software more specialised than a text editor. So there are times when it is desirable to convert from one of these encodings to another.
votcopy
is a tool which translates between these
encodings while
making a minimum of other changes to the VOTable document.
The processing may result in some changes to lexical details
such as whitespace in start tags, but the element structure is not
modified. Unlike tpipe
it does not impose
STIL's model of what constitutes a table on the data between
reading it in and writing it out, so subtleties dependent on
the exact structure of the VOTable document will not be mangled.
The only important changes should be the contents of
DATA
elements in the document.
The usage of votcopy
is
stilts <stilts-flags> votcopy charset=<xml-encoding> cache=true|false href=true|false base=<location> [in=]<location> [out=]<location> [format=]tabledata|binary|fits|emptyIf you don't have the
stilts
script installed,
write "java -jar stilts.jar
" instead of
"stilts
" - see Section 3.
The available <stilts-flags>
are listed
in Section 2.1.
Parameter values are assigned on the command line as explained in Section 2.3. They are as follows:
base = <location>
-href
flag is true.
Normally these are given names based on the name of the
output file.
But if this flag is given, the names will be based on the
<location>
string.
This flag is compulsory if
href
is true
and no out=-
(output is to standard out),
since in this case there is no default base name to use.
cache = true|false
false
]charset = <xml-encoding>
help=charset
for a full listing.
format = tabledata|binary|fits|empty
empty
is selected, then the tables will be
data-less (will contain no DATA element), leaving only
the document structure.
Data-less tables are legal VOTable elements.
[Default: tabledata
]href = true|false
base
flag.
[Default: false
]in = <location>
-
]out = <location>
-
]Normal use of votcopy
is pretty straightforward.
We give here a couple of examples of its input and output.
Here is an example VOTable document, cat.vot
:
<VOTABLE> <RESOURCE> <TABLE name="Authors"> <FIELD name="AuthorName" datatype="char" arraysize="*"/> <DATA> <TABLEDATA> <TR><TD>Charles Messier</TD></TR> <TR><TD>Mark Taylor</TD></TR> </TABLEDATA> </DATA> </TABLE> <RESOURCE> <COOSYS equinox="J2000.0" epoch="J2000.0" system="eq_FK4"/> <TABLE name="Messier Objects"> <FIELD name="Identifier" datatype="char" arraysize="10"/> <FIELD name="RA" datatype="double" units="degrees"/> <FIELD name="Dec" datatype="double" units="degrees"/> <DATA> <TABLEDATA> <TR> <TD>M51</TD> <TD>202.43</TD> <TD>47.22</TD> </TR> <TR> <TD>M97</TD> <TD>168.63</TD> <TD>55.03</TD> </TR> </TABLEDATA> </DATA> </TABLE> </RESOURCE> </RESOURCE> </VOTABLE>Note that it contains more structure than just a flat table: there are two
TABLE
elements,
the RESOURCE
element of the second one being nested
in the RESOURCE
of the first.
Processing this document using a generic table tool such as
tpipe
or tcopy
would lose this structure.
To convert the data encoding to BINARY format, we simply execute
stilts votcopy format=binary cat.votand the output is
<?xml version="1.0"?> <VOTABLE> <RESOURCE> <TABLE name="Authors"> <FIELD name="AuthorName" datatype="char" arraysize="*"/> <DATA> <BINARY> <STREAM encoding='base64'> AAAAD0NoYXJsZXMgTWVzc2llcgAAAAtNYXJrIFRheWxvcg== </STREAM> </BINARY> </DATA> </TABLE> <RESOURCE> <COOSYS equinox="J2000.0" epoch="J2000.0" system="eq_FK4"/> <TABLE name="Messier Objects"> <FIELD name="Identifier" datatype="char" arraysize="10"/> <FIELD name="RA" datatype="double" units="degrees"/> <FIELD name="Dec" datatype="double" units="degrees"/> <DATA> <BINARY> <STREAM encoding='base64'> TTUxAAAAAAAAAEBpTcKPXCj2QEecKPXCj1xNOTcAAAAAAAAAQGUUKPXCj1xAS4PX Cj1wpA== </STREAM> </BINARY> </DATA> </TABLE> </RESOURCE> </RESOURCE> </VOTABLE>Note that both tables in the document have been translated to BINARY format. The basic structure of the document is unchanged: the only differences are within the
DATA
elements. If we ran
stilts votcopy format=tabledataon either this output or the original input then the output would be identical (apart perhaps from whitespace) to the input table, since the data are originally in TABLEDATA format.
To generate a VOTable document with the data in external files,
the href
parameter is used. We will output in FITS format
this time. Executing:
stilts votcopy format=fits href=true cat.vot fcat.votwrites the following to the file
fcat.vot
:
... <DATA> <FITS> <STREAM href="fcat-1.fits"/> </FITS> </DATA> ... <DATA> <FITS> <STREAM href="fcat-2.fits"/> </FITS> </DATA> ...(the unchanged parts of the document have been skipped here for brevity). The actual data are written in two additional files in the same directory as the output file,
fcat-1.fits
and
fcat-2.fits
. These filenames are based on the
main output filename, but can be altered using the base
flag if required. Note this has also given you FITS binary table
versions of all the tables in the input VOTable document, which can be
operated on by normal FITS-aware software quite separately from the VOTable
if required.
votlint
: VOTable Validity Checker
The VOTable standard, while not hugely complicated, has a number of subtleties and it's not difficult to produce VOTable documents which violate it in various ways. In fact it's probably true to say that most VOTable documents out there are not strictly legal. In some cases the errors are small and a parser is likely to process the document without noticing the trouble. In other cases, the errors are so serious that it's hard for any software to make sense of it. In many cases in between, different software will react in different ways, in the worst case appearing to parse a VOTable but in fact understanding the wrong data.
votlint
is a program which can check a VOTable document
and spot places where it does not conform to the VOTable standard,
or places which look like they may not mean what the author intended.
It is meant for use in two main scenarios:
Validating a VOTable document against the VOTable schema or DTD
of course goes a long way towards checking a VOTable document for errors
(though it's clear that many VOTable authors don't even go this far),
but it by no means does the whole job, simply because the schema/DTD
specification languages don't have the facilities
to understand the data structure
of a VOTable document. For instance the VOTable schema
will allow any plain text content in a TD
element, but whether
this makes sense in a VOTable depends on the datatype
attribute of the corresponding FIELD
element. There are many
other examples.
votlint
tackles this by parsing the VOTable document
in a way which understands its structure and assessing the content
as critically as it can. For any incorrect or questionable content
it finds, it will output a short message describing the problem
and giving its location in the document. What you do with this
information is then up to you.
Using votlint
is very straightforward.
The votable
argument
gives the location (filename or URL) of a VOTable document.
Otherwise, the document will be read from standard input.
Error and warning messages will be written on standard error.
Each message is prefixed with the location at which the error was
found (if possible the line and column are shown, though this is
dependent on your JVM's default XML parser).
The processing is SAX-based, so arbitrarily long tables can
be processed without heavy memory use.
votlint
can't guarantee to pick up every possible
error in a VOTable document, but it ought to pick up many of the
most serious errors that are commonly made in authoring VOTables.
The usage of votlint
is
stilts <stilts-flags> votlint validate=true|false version=1.0|1.1 [votable=]<location>If you don't have the
stilts
script installed,
write "java -jar stilts.jar
" instead of
"stilts
" - see Section 3.
The available <stilts-flags>
are listed
in Section 2.1.
Parameter values are assigned on the command line as explained in Section 2.3. They are as follows:
validate = true|false
votlint
's own checks,
it is validated against an appropriate version of the VOTable
DTD which picks up such things as the presence of
unknown elements and attributes, elements in the wrong place,
and so on.
Sometimes however, particularly when XML namespaces are
involved, the validator can get confused and may produce
a lot of spurious errors. Setting this flag false prevents
this validation step so that only votlint
's
own checks are performed.
In this case many violations of the VOTable standard
concerning document structure will go unnoticed.
[Default: true
]version = 1.0|1.1
votable = <location>
-
]Votlint checks that the XML input is well-formed, and, unless the
valid=false
parameter is supplied, that it validates against the
1.0 or 1.1 (as appropriate) DTD. Although VOTable 1.1 is properly
defined against an XML Schema rather than a DTD, in conjunction with
the other checks done, the DTD validation turns out to be pretty comprehensive.
Some of the DTD validity checks are also done by
votlint
internally, so that some validity-type
errors may give rise to more than one warning.
In general, the program errs on the side of verbosity.
In addition to these checks, the following checks are carried out, and lead to ERROR reports if violations are found:
TD
contents incompatible
datatype
/arraysize
attributes declared
in FIELD
FIELD
PARAM
values incompatible with declared
datatype
/arraysize
arraysize
declarationsTD
elements with the wrong number of elementsPARAM
values with the wrong number of
elementsnrows
attribute on TABLE
element different
from the number of rows actually in the tableVOTABLE
version
attribute is unknownref
attributes without matching ID
elements
elsewhere in the documentID
attribute value on multiple elements.Additionally, the following conditions, which are not actually forbidden by the VOTable standard, will generate WARNING reports. Some of these may result from harmless constructions, but it is wise at least to take a look at the input which caused them:
TD
elements in row of TABLEDATA
tableTABLE
with no FIELD
elementsFIELD
or PARAM
elements with
datatype
of either
char
or unicodeChar
and undeclared arraysize
-
this is a common error which can result in
ignoring all but the first character in TD
elements from
a columnref
attributes which reference other elements by
ID
where the reference makes no, or questionable sense
(e.g. FIELDref
references FIELD
in a
different table)FIELD
s) with the
same name
attributesHere is a brief example of running votlint
against
a (very short) imperfect VOTable document. If the document looks like
this:
<VOTABLE version="1.1"> <RESOURCE> <TABLE nrows="2"> <FIELD name="Identifier" datatype="char"/> <FIELD name="RA" datatype="double"/> <FIELD name="Dec" datatype="double"/> <DESCRIPTION>A very small table</DESCRIPTION> <DATA> <TABLEDATA> <TR> <TD>Fomalhaut</TD> <TD>344.48</TD> <TD>-29.618</TD> <TD>HD 216956</TD> </TR> </TABLEDATA> </DATA> </TABLE> </RESOURCE> </VOTABLE>then the output of a
votlint
run looks like this:
INFO (l.4): No arraysize for character, FIELD implies single character ERROR (l.7): Element "TABLE" does not allow "DESCRIPTION" here. WARNING (l.11): Characters after first in char scalar ignored (missing arraysize?) WARNING (l.15): Wrong number of TDs in row (expecting 3 found 4) ERROR (l.18): Row count (1) not equal to nrows attribute (2)Note the warning at line 11 has resulted from the same error as the one at line 4 - because the
FIELD
element has no
arraysize
attribute, arraysize="1"
(single character) is assumed,
while the author almost certainly intended arraysize="*"
(unknown length string).
By examining these warnings you can see what needs to be done to fix this table up. Here is what it should look like:
<VOTABLE version="1.1"> <RESOURCE> <TABLE nrows="1"> <!-- change row count --> <DESCRIPTION>A very small table</DESCRIPTION> <!-- move DESCRIPTION --> <FIELD name="Identifier" datatype="char" arraysize="*"/> <!-- add arraysize --> <FIELD name="RA" datatype="double"/> <FIELD name="Dec" datatype="double"/> <DATA> <TABLEDATA> <TR> <TD>Fomalhaut</TD> <TD>344.48</TD> <TD>-29.618</TD> </TR> <!-- remove extra TD --> </TABLEDATA> </DATA> </TABLE> </RESOURCE> </VOTABLE>When fed this version,
votlint
gives no warnings.
This is STILTS, Starlink Tables Infrastructure Library Tool Set. It is a collection of non-graphical utilites for general purpose table and VOTable manipulation developed by Starlink.
The initial development of STILTS was done under the UK's now-deceased Starlink project, without which it would not have been written.
Apart from the excellent Java 2 Standard Edition itself, the following external libraries provide important parts of TOPCAT's functionality:
Many people have contributed ideas and advice to the development of STILTS and its related products. I can't list all of them here, but my thanks are especially due to the following:
Releases to date have been as follows:
stilts
",
invoked using the stilts
script or the
stilts.jar
jar file, and the various tasks are
named as subsequent arguments on the command line.
Command arguments are supplied after that.
The new invocation syntax is described in detail elsewhere in
this document. As well as invocation features such
as improved on-line help, optional prompting,
parameter defaulting, and more uniform access to common features,
this will make it more straightforward to wrap these tasks
for use in non-command-line environments, such as behind a
SOAP or CORBA interface, or in a CEA-like execution environment.
tmatch2
has been introduced.
This provides flexible and efficient crossmatching between
two input tables. Future releases will provide commands for
intra-table and multi-table matching.
tcat
has been introduced, which
allows two tables to be glued together top-to-bottom.
This is currently working but very rudimentary - improvements
will be forthcoming in future releases.
calc
has been introduced,
which performs one-line expression evaluations from the
command line.
tpipe
and other commands have been introduced:
addskycoords
: calculates new
celestial coordinate pair from existing ones
(FK4, FK5, ecliptic, galactic, supergalactic)replacecol
: replaces column data,
using existing metadatabadval
: replaces given 'magic'
value with nullreplaceval
: replaces given 'magic'
value with any specified valuetablename
: edits table nameexplodecols
and explodecols
commands
replace explode
The new stream
parameter of tpipe
now
allows you to write filter commands in an external file, to
facilitate more manageable command lines.
Wildarding for column specification is now allowed for some filter commands.
tcube
Command
stats
filter provides the same information as
the old stats
output mode, but allows much more
flexible use of the results. It can also calculates many new
quantities, including quantiles, skew and kurtosis.meta
filter provides the same information as
the old meta
output mode, but allows much more
flexible use of the results.assert
filter provides in-pipeline logical
assertions.uniq
filter collapses multiple adjacent identical
or similar rows.sorthead
filter provides a (usually) more
efficient method of doing what you could previously do
by combining sort
and head
filters.colmeta
filter adds/modifies metadata for selected
columns.check
filter checks table in stream - for debugging
purposes only.Additionally usage of the sort
filter has been changed
so that it can now do everything that sortexpr
used to
be able to do; sortexpr
is now withdrawn.
plastic
mode broadcasts the table to
one or all registered PLASTIC listeners.cgi
mode writes the table to standard output in a
form suitable for output from a CGI script.discard
mode throws away the table.topcat
mode now attempts to use PLASTIC
(amongst other methods) to contact TOPCAT.stats
and meta
modes are mildly
deprecated in favour of the corresponding new filters
(see above).csv-noheader
format variant output handler
added.roundDecimal
and formatDecimal
functions introduced for more control over visual appearance
of numeric values.mark.workaround
system property which can
optionally work around a bug in some input streams
("Resetting to invalid mark" errors).ucd
and
utype
attributes of TABLE element in
votlint
.istream=true
is now less likely to cause a
"Can't re-read stream" error.