Next Previous Up Contents
Next: Specifying a single column
Up: Table Pipelines
Previous: Table Pipelines

5.1 Processing Filters

This section lists the filter commands which can be used for table pipeline processing, in conjunction with cmd- or script-type parameters.

You can string as many of these together as you like. On the command line, you can repeat the cmd (or icmd1, or ocmd...) parameter multiple times, or use one cmd parameter and separate different filter specifiers with semicolons (";"). The effect is the same.

It's important to note that each command in the sequence of processing steps acts on the table at that point in the sequence. Thus

   stilts tpipe cmd='delcols 1; delcols 1; delcols 1'
has the same effect as
   stilts tpipe cmd='delcols "1 2 3"'
since in the first case the columns are shifted left after each one is deleted, so the table seen by each step has one fewer column than the one before. Note also the use of quotes in the latter of the examples above, which is necessary so that the <colid-list> of the delcols command is interpreted as one argument not three separate words.

The syntax of some of these arguments is described elsewhere in this document:

addcol [-after <col-id> | -before <col-id>]
       [-units <units>] [-ucd <ucd>] [-desc <description>]
       <col-name> <expr>
Add a new column called <col-name> defined by the algebraic expression <expr>. By default the new column appears after the last column of the table, but you can position it either before or after a specified column using the -before or -after flags respectively. The -units, -ucd and -desc flags can be used to define metadata values for the new column.
addskycoords [-epoch <expr>] [-inunit deg|rad|sex] [-outunit deg|rad|sex]
             <insys> <outsys> <col-id1> <col-id2> <col-name1> <col-name2>
Add new columns to the table representing position on the sky. The values are determined by converting a sky position whose coordinates are contained in existing columns. The <col-id> arguments give identifiers for the two input coordinate columns in the coordinate system named by <insys>, and the <col-name> arguments name the two new columns, which will be in the coordinate system named by <outsys>. The <insys> and <outsys> coordinate system specifiers are one of

The -inunit and -outunit flags may be used to indicate the units of the existing coordinates and the units for the new coordinates respectively; use one of degrees, radians or sexagesimal (may be abbreviated), otherwise degrees will be assumed. For sexagesimal, the two corresponding columns must be string-valued in forms like hh:mm:ss.s and dd:mm:ss.s respectively.

For certain conversions, the value specified by the -epoch flag is of significance. Where significant its value defaults to 2000.0.

assert <expr>
Check that a boolean expression is true for each row. If the expression <expr> does not evaluate true for any row of the table, execution terminates with an error. As long as no error occurs, the output table is identical to the input one.

The exception generated by an assertion violation is of class uk.ac.starlink.ttools.filter.AssertException although that is not usually obvious if you are running from the shell in the usual way.

badval <bad-val> <colid-list>
For each column specified in <colid-list> any occurrence of the value <bad-val> is replaced by a blank entry.
cache
Stores in memory or on disk a temporary copy of the table at this point in the pipeline. This can provide improvements in efficiency if there is an expensive step upstream and a step which requires more than one read of the data downstream. If you see an error like "Can't re-read data from stream" then adding this step near the start of the filters might help.
check
Runs checks on the table at the indicated point in the processing pipeline. This is strictly a debugging measure, and may be time-consuming for large tables.
clearparams <pname> ...
Clears the value of one or more named parameters. Each of the <pname> values supplied may be either a parameter name or a simple wildcard expression matching parameter names. Currently the only wildcarding is a "*" to match any sequence of characters. clearparams * will clear all the parameters in the table. It is not an error to supply <pname>s which do not exist in the table - these have no effect.
colmeta [-name <name>] [-units <units>] [-ucd <ucd>] [-desc <descrip>]
        <colid-list>
Modifies the metadata of one or more columns. Some or all of the name, units, ucd and description of the column(s), identified by <colid-list> can be set by using some or all of the listed flags. Typically, <colid-list> will simply be the name of a single column.
delcols <colid-list>
Delete the specified columns. The same column may harmlessly be specified more than once.
every <step>
Include only every <step>'th row in the result, starting with the first row.
explodeall
Replaces any column which is an N-element array with N scalar columns. Only columns with fixed array sizes are affected.
explodecols <colid-list>
Takes a list of specified columns which represent N-element arrays and replaces each one with N scalar columns. Each of the columns specified by <colid-list> must have a fixed-length array type, though not all the arrays need to have the same number of elements.
head <nrows>
Include only the first <nrows> rows of the table.
keepcols <colid-list>
Select the columns from the input table which will be included in the output table. The output table will include only those columns listed in <colid-list>, in that order. The same column may be listed more than once, in which case it will appear in the output table more than once.
meta [<item> ...]
Provides information about the metadata for each column. This filter turns the table sideways, so that each row of the output corresponds to a column of the input. The columns of the output table contain metadata items such as column name, units, UCD etc corresponding to each column of the input table.

By default the output table contains columns for the items Index, Name, Class, Shape, Units, Description and UCD, as well as any table-specific column metadata items that the table contains. The output may be customised however by supplying one or more <item> headings. These may be selected from the list Index, Name, Class, Shape, Units, Description, UCD and UCD_desc, as well as any table-specific metadata. It is not an error to specify an item for which no metadata exists in any of the columns.

Any table parameters of the input table are propagated to the output one.

progress
Monitors progress by displaying the number of rows processed so far on the terminal (standard error). This number is updated every second or thereabouts; if all the processing is done in under a second you may not see any output. If the total number of rows in the table is known, an ASCII-art progress bar is updated, otherwise just the number of rows seen so far is written.
random
Ensures that steps downstream see the table as random access. Only useful for debugging.
replacecol [-name <name>] [-units <units>] [-ucd <ucd>] [-desc <descrip>]
           <col-id> <expr>
Replaces the content of a column with the value of an algebraic expression. The old values are discarded in favour of the result of evaluating <expr>. You can specify the metadata for the new column using the -name, -units, -ucd and -desc flags; for any of these items which you do not specify, they will take the values from the column being replaced. You can reference the replaced column in the expression, so for example "replacecol pixsize pixsize*2" just multiplies the values in column pixsize by 2.
replaceval <old-val> <new-val> <colid-list>
For each column specified in <colid-list> any instance of <old-val> is replaced by <new-val>. The value string 'null' can be used for either <old-value> or <new-value> to indicate a blank value.
select <expr>
Include in the output table only rows for which the expression <expr> evaluates to true. <expr> must be an expression which evaluates to a boolean value (true/false).
sequential
Ensures that steps downstream see the table as sequential access. Only useful for debugging.
setparam [-type byte|short|int|long|float|double|boolean|string]
         [-desc <descrip>] <pname> <pval>
Sets a named parameter in the table to a given value. The parameter named <pname> is set to the value <pval>. By default the type of the parameter is determined automatically (if it looks like an integer it's an integer etc) but this can be overridden using the -type flag. The parameter description may be set using the -descrip flag.
sort [-down] [-nullsfirst] <key-list>
Sorts the table according to the value of one or more algebraic expressions. The sort key expressions appear, as separate (space-separated) words, in <key-list>; sorting is done on the first expression first, but if that results in a tie then the second one is used, and so on. Each expression must evaluate to a type that it makes sense to sort, for instance numeric. If the -down flag is used, the sort order is descending rather than ascending. Blank entries are usually considered to come at the end of the collation sequence, but if the -nullsfirst flag is given then they are considered to come at the start instead.
sorthead [-tail] [-down] [-nullsfirst] <nrows> <key-list>
Performs a sort on the table according to the value of one or more algebraic expressions, retaining only <nrows> rows at the head of the resulting sorted table. The sort key expressions appear, as separate (space-separated) words, in <key-list>; sorting is done on the first expression first, but if that results in a tie then the second one is used, and so on. If the -tail flag is used, then the last <nrows> rows rather than the first ones are retained. If the -down flag is used the sort order is descending rather than ascending. Blank entries are usually considered to come at the end of the collation sequence, but if the -nullsfirst flag is given then they are considered to come at the start instead. Each expression must evaluate to a type that it makes sense to sort, for instance numeric. This filter is functionally equivalent to using sort followed by head, but it can be done in one pass and is usually cheaper on memory and faster, as long as <nrows> is significantly lower than the size of the table.
stats [<item> ...]
Calculates statistics on the data in the table. This filter turns the table sideways, so that each row of the output corresponds to a column of the input. The columns of the output table contain statistical items such as mean, standard deviation etc corresponding to each column of the input table.

By default the output table contains columns for the items Name, Mean, StDev, Minimum, Maximum and NGood. The output may be customised however by supplying one or more <item> headings. These may be selected from the list NGood, NBad, Mean, StDev, Variance, Skew, Kurtosis, Minimum, Maximum, Sum, MinPos, MaxPos, Cardinality, Median, Quartile1, Quartile2 and Quartile3, or have the form "Q.nn" to represent the quantile corresponding to the proportion 0.nn; for instance Q.5 is an alias for Median, and Q.25 for Quartile1.

Any parameters of the input table are propagated to the output one.

Note that quantile calculations (including median and quartiles) can be expensive on memory. If you want to calculate quantiles for large tables, it may be wise to reduce the number of columns to only those you need the quantiles for earlier in the pipeline. No interpolation is performed when calculating quantiles.

tablename <name>
Sets the table's name attribute to the given string.
tail <nrows>
Include only the last <nrows> rows of the table.
transpose [-namecol <col-id>]
Transposes the input table so that columns become rows and vice versa. The -namecol flag can be used to specify a column in the input table which will provide the column names for the output table. The first column of the output table will contain the column names of the input table.
uniq [-count] [<colid-list>]
Eliminates adjacent rows which have the same values. If used with no arguments, then any row which has identical values to its predecessor is removed. If the <colid-list> parameter is given then only the values in the specified columns must be equal in order for the row to be removed.

If the -count flag is given, then an additional column with the name DupCount will be prepended to the table giving a count of the number of duplicated input rows represented by each output row. A unique row has a DupCount value of 1.


Next Previous Up Contents
Next: Specifying a single column
Up: Table Pipelines
Previous: Table Pipelines

STILTS - Starlink Tables Infrastructure Library Tool Set
Starlink User Note 256
STILTS web page: http://www.starlink.ac.uk/stilts/
Author email: m.b.taylor@bristol.ac.uk