Next Previous Up Contents
Next: Processing Filters
Up: Top
Previous: class


6 Table Pipelines

Several of the tasks available in STILTS take one or more input tables, do something or other with them, and produce one or more output tables. This is a pretty obvious way to go about things, and in the most straightforward case that's exactly what happens: you name one or more input tables, specify the processing parameters, and name an output table; the task then reads the input tables from disk, does the processing and writes the output table to disk.

However, many of the tasks in STILTS allow you to do pre-processing of the input tables before the main job, post-processing of the output table after the main job, and to decide what happens to the final tabular result, without any intermediate storage of the data. Examples of the kind of pre-processing you might want to do are to rearrange the columns so that they have the right units for the main task, or replace 'magic' values such as -999 with genuine blank values; the kind of post-processing you might want to do is to sort the rows in the output table or delete some of the columns you're not interested in. As for the destination of the final table, you might want to write it to disk, but equally you might not want to store it anywhere, but only be interested in counting the number of rows, or seeing the minima/maxima of a few of the columns, or you might want to send it straight to TOPCAT or some other table viewing application for interactive analysis.

Clearly, you could achieve the same effect by running multiple applications: preprocess your original input tables to write intermediate files on disk, run the main processing application which reads those files from disk and writes a new output file, run another application to postprocess the output file and write a new final output file, and finally do something with this such as counting the rows in it or viewing it in TOPCAT. However, by doing it all within a single task instead, no intermediate results have to be stored, and the whole sequence can be very much more efficient. You can think of this (if it helps) like a Unix pipeline, except what is being streamed from the start to the end of the pipe is not bytes, but table metadata and data. In most cases, the table data is streamed through the pipeline a row at a time, meaning that the amount of memory required is small (though in some cases, for instance row sorting and crossmatching, this is not possible).

Tasks which allow this pre/post-processing, or "filtering", have parameters with names like "cmd" which you use to specify processing steps. Tasks with multiple input tables (tmatch2, tskymatch2, tcatn, tjoin) may have parameters named icmd1, icmd2, ... for preprocessing the different input tables and ocmd for postprocessing the output table. tpipe does nothing except filtering, so there is no distinction between pre- and post-processing, and its filter parameter is just named cmd. tpipe additionally has a script parameter which allows you to use a text file to write the commands in, to prevent the command line getting too long. In both cases there is a parameter named omode which defines the "output mode", that is, what happens to the post-processed output table that comes out of the end of the pipeline.

Section 6.1 lists the processing steps available, and explains how to use them, Section 6.2 and Section 6.3 describe the syntax used in some of these filter commands for specifying columns, and Section 6.4 describes the available output modes. See the examples in the command reference, and particularly the tpipe examples, for some examples putting all this together.


Next Previous Up Contents
Next: Processing Filters
Up: Top
Previous: class

STILTS - Starlink Tables Infrastructure Library Tool Set
Starlink User Note256
STILTS web page: http://www.starlink.ac.uk/stilts/
Author email: m.b.taylor@bristol.ac.uk
Mailing list: topcat-user@jiscmail.ac.uk