Next Previous Up Contents

This section provides a bit more detail on the how the row matching
of local tables
(sections Section 5.2 and Section 5.3)
is done.
It is designed to give a rough idea to interested parties;
it is *not* a tutorial description from
first principles of how it all works.

The basic algorithm for matching is based on dividing up the space of possibly-matching rows into an (indeterminate) number of bins. These bins will typically correspond to disjoint cells of a physical or notional coordinate space, but need not do so. In the first step, each row of each table is assessed to determine which bins might contain matches to it - this will generally be the bin that it falls into and any "adjacent" bins within a distance corresponding to the matching tolerance. A reference to the row is associated with each such bin. In the second step, each bin is examined, and if two or more rows are associated with it every possible pair of rows in the associated set is assessed to see whether it does in fact constitute a matched pair. This will identify all and only those row pairs which are related according to the selected match criteria. During this process a number of optimisations may be applied depending on the details of the data and the requested match.

The matching algorithm described above is
roughly an *O(N log(N))* process,
where *N* is the total number of rows in all the tables participating
in a match. This is good news, since the naive interpretation
would be *O(N ^{2})*.
This can break down however if the matching
tolerance is such that the number of rows associated with some or most bins
gets large, in which case an

For more detail on the matching algorithms, see the
javadocs for the `uk.ac.starlink.table.join`

package,
or contact the author.

Next Previous Up Contents

Starlink User Note253

TOPCAT web page: http://www.starlink.ac.uk/topcat/

Author email: m.b.taylor@bristol.ac.uk

Mailing list: topcat-user@jiscmail.ac.uk