uk.ac.starlink.table.join
Interface MatchEngine

All Known Implementing Classes:
CartesianMatchEngine, CombinedMatchEngine, EqualsMatchEngine, HTMMatchEngine

public interface MatchEngine

Defines the details of object matching criteria. This interface provides methods for ascertaining whether two table rows are to be linked - this usually means that they are to be assumed to refer to the same object. The methods act on 'tuples' - an array of objects defining the relevant characteristics of a row. Of course these tuples have to be prepared with understanding of what a particular implementation of this interface knows how to deal with, which can be obtained from the getTupleInfos() method. Typically a tuple will be a list of coordinates, such as RA and Dec.

The business end of the interface consists of two methods. One simply tests whether two tuples count as the same or not (in practice, this is likely to compare corresponding elements of the two submitted tuples allowing for some error in each one). The second is a bit more subtle: it must identify a set of bins into which possible matches for the tuple might fall. For the case of coordinate matching with errors, you would need to chop the whole possible space into a discrete set of zones, each with a given key, and return the key for each zone near enough to the submitted tuple (point) that it might contain a match for it.

Formally, the requirements for correct implementations of this interface are as follows:

  1. matches(t1,t2) == matches(t2,t1)
  2. matches(t1,t2) implies a non-zero intersection of getBins(t1) and getBins(t2)
The best efficiency will be achieved when:
  1. the intersection of getBins(t1) and getBins(t2) is as small as possible for non-matching t1 and t2 (preferably 0)
  2. the number of bins returned by getBins is as small as possible (preferably 1)
These two efficiency requirements are usually conflicting to some extent.

It may help to think of all this as a sort of fuzzy hash.


Method Summary
 Object[] getBins(Object[] tuple)
          Returns a set of keys for bins into which possible matches for a given tuple might fall.
 DescribedValue[] getMatchParameters()
          Returns a set of DescribedValue objects whose values can be modified to modify the matching criteria.
 ValueInfo[] getTupleInfos()
          Returns a set of ValueInfo objects indicating what is required for the elements of each tuple.
 boolean matches(Object[] tuple1, Object[] tuple2)
          Indicates whether two tuples are to be linked.
 

Method Detail

getBins

public Object[] getBins(Object[] tuple)
Returns a set of keys for bins into which possible matches for a given tuple might fall. The returned objects can be anything, but should have their equals and hashCode methods implemented properly for comparison.

Parameters:
tuple -
Returns:
set of bin keys which might be returned by invoking this method on other tuples which count as matches for the submitted tuple

matches

public boolean matches(Object[] tuple1,
                       Object[] tuple2)
Indicates whether two tuples are to be linked.

Parameters:
tuple1 - one tuple
tuple2 - the other tuple
Returns:
true iff tuple1 should be considered a match for tuple2

getTupleInfos

public ValueInfo[] getTupleInfos()
Returns a set of ValueInfo objects indicating what is required for the elements of each tuple. The length of this array is the number of elements in the tuple. Each element should at least have a defined name and content class. The info's nullable attribute has a special meaning: if true it means that it makes sense for this element of the tuple to be always blank (for instance assigned to no column).

Returns:
array of objects describing the requirements on each element of the tuples used for matching

getMatchParameters

public DescribedValue[] getMatchParameters()
Returns a set of DescribedValue objects whose values can be modified to modify the matching criteria. Typically at least one of these will be some sort of tolerance separation which determines how close tuples must be to count as a match. This match engine's behaviour can be modified by calling DescribedValue.setValue(java.lang.Object) on the returned objects.

Returns:
array of described values which influence the match

Copyright © 2004 CLRC: Central Laboratory of the Research Councils. All rights reserved.