tgroup
: Calculates aggregate functions on groups of rows
tgroup
identifies groups of rows in
a table based on the values in a given column or columns,
and calculates statistical quantities or otherwise collapses
down the multiple values from other columns into single values
representing each group.
It does the same job as a SELECT ... GROUP BY
statement
with aggregate functions in ADQL/SQL.
The keys
parameter defines how input rows are grouped,
and the aggcols
parameter defines what quantities to
aggregate from the rows in each group.
keys
specifies one or more values (column names or expresssions)
that must be the same for rows grouped together,
while aggcols
specifies zero or more columns to be
added based on the content of rows in each group.
The output table therefore contains
one column for each entry in keys
and one column for each entry in aggcols
,
and has one row for each group identified.
This command can therefore be used to count rows or calculate statistical quantities per group. A number of statistical aggregation methods are provided such as mean, median, minimum, maximum etc. For more specialised requirements, for instance quantiles or custom statistics, you can also use the array aggregators which generate an array containing all of the values in the group, and operate on the resulting column using one of the functions in the Arrays class.
By way of comparison, the tgroup
invocation:
stilts tgroup in=t keys="year detector" aggcols="0;count;num gmag;min;min_gmag gmag;mean"corresponds roughly to the ADQL query:
SELECT COUNT(*) AS num, MIN(gmag) AS min_gmag, MEAN(gmag), FROM t GROUP BY year, detector
See also the
tgridmap
and
tskymap
commands,
which provide similar functionality where the grouping is over
evenly spaced numeric/coordinate values.