Several of the tasks available in STILTS take one or more input tables, do something or other with them, and produce one or more output tables. This is a pretty obvious way to go about things, and in the most straightforward case that's exactly what happens: you name one or more input tables, specify the processing parameters, and name an output table; the task then reads the input tables from disk, does the processing and writes the output table to disk.
However, many of the tasks in STILTS allow you to do pre-processing of the input tables before the main job, post-processing of the output table after the main job, and to decide what happens to the final tabular result, without any intermediate storage of the data. Examples of the kind of pre-processing you might want to do are to rearrange the columns so that they have the right units for the main task, or replace 'magic' values such as -999 with genuine blank values; the kind of post-processing you might want to do is to sort the rows in the output table or delete some of the columns you're not interested in. As for the destination of the final table, you might want to write it to disk, but equally you might not want to store it anywhere, but only be interested in counting the number of rows, or seeing the minima/maxima of a few of the columns, or you might want to send it straight to TOPCAT or some other table viewing application for interactive analysis.
Clearly, you could achieve the same effect by running multiple applications: preprocess your original input tables to write intermediate files on disk, run the main processing application which reads those files from disk and writes a new output file, run another application to postprocess the output file and write a new final output file, and finally do something with this such as counting the rows in it or viewing it in TOPCAT. However, by doing it all within a single task instead, no intermediate results have to be stored, and the whole sequence can be very much more efficient. You can think of this (if it helps) like a Unix pipeline, except what is being streamed from the start to the end of the pipe is not bytes, but table metadata and data. In most cases, the table data is streamed through the pipeline a row at a time, meaning that the amount of memory required is small (though in some cases, for instance row sorting and crossmatching, this is not possible).
Tasks which allow this pre/post-processing, or "filtering",
have parameters with names like "cmd
" which you
use to specify processing steps.
Tasks with multiple input tables
(tmatch2
,
tskymatch2
,
tcatn
,
tjoin
)
may have parameters named icmd1
, icmd2
, ...
for preprocessing the different input tables and
ocmd
for postprocessing the output table.
tpipe
does nothing except
filtering, so there is no distinction between pre- and post-processing,
and its filter parameter is just named cmd
.
tpipe
additionally has a script
parameter which allows you to use a text file to write the
commands in, to prevent the command line getting too long.
In both cases there is a parameter named omode
which defines the "output mode", that is, what happens to the
post-processed output table that comes out of the end of the pipeline.
Section 6.1 lists the processing steps available,
and explains how to use them,
Section 6.2 and Section 6.3 describe the syntax
used in some of these filter commands for specifying columns,
and Section 6.4 describes the available output modes.
See the examples in the
command reference,
and particularly the
tpipe
examples,
for some examples putting all this together.
addcol
addpixsample
addresolve
addskycoords
assert
badval
cache
check
clearparams
collapsecols
colmeta
constcol
delcols
every
explodeall
explodecols
fixcolnames
group
head
healpixmeta
keepcols
meta
progress
random
randomview
repeat
replacecol
replaceval
rowrange
select
seqview
setparam
shuffle
sort
sorthead
stats
tablename
tail
transpose
uniq