uk.ac.starlink.table.join
Interface MatchEngine

All Known Implementing Classes:
AbstractCartesianMatchEngine, CombinedMatchEngine, EqualsMatchEngine, SkyMatchEngine, SphericalPolarMatchEngine

public interface MatchEngine

Defines the details of object matching criteria. This interface provides methods for ascertaining whether two table rows are to be linked - this usually means that they are to be assumed to refer to the same object. The methods act on 'tuples' - an array of objects defining the relevant characteristics of a row. Of course these tuples have to be prepared with understanding of what a particular implementation of this interface knows how to deal with, which can be obtained from the getTupleInfos() method. Typically a tuple will be a list of coordinates, such as RA and Dec.

The business end of the interface consists of two methods. One tests whether two tuples count as matching or not, and assigns a closeness score if they are (in practice, this is likely to compare corresponding elements of the two submitted tuples allowing for some error in each one). The second is a bit more subtle: it must identify a set of bins into which possible matches for the tuple might fall. For the case of coordinate matching with errors, you would need to chop the whole possible space into a discrete set of zones, each with a given key, and return the key for each zone near enough to the submitted tuple (point) that it might contain a match for it.

Formally, the requirements for correct implementations of this interface are as follows:

  1. matchScore(t1,t2) == matchScore(t2,t1)
  2. matchScore(t1,t2)>=0 implies a non-zero intersection of getBins(t1) and getBins(t2)
The best efficiency will be achieved when:
  1. the intersection of getBins(t1) and getBins(t2) is as small as possible for non-matching t1 and t2 (preferably 0)
  2. the number of bins returned by getBins is as small as possible (preferably 1)
These two efficiency requirements are usually conflicting to some extent.

It may help to think of all this as a sort of fuzzy hash.


Field Summary
static Object[] NO_BINS
          Convenience constant - it's a zero-length array of objects, suitable for returning from getBins(java.lang.Object[]) if no match can result.
 
Method Summary
 boolean canBoundMatch()
          Indicates that the getMatchBounds(java.lang.Comparable[], java.lang.Comparable[]) method can be invoked to provide some sort of useful result.
 Object[] getBins(Object[] tuple)
          Returns a set of keys for bins into which possible matches for a given tuple might fall.
 Comparable[][] getMatchBounds(Comparable[] minTuple, Comparable[] maxTuple)
          Given a range of tuple values, returns a range outside which no match to anything within that range can result.
 DescribedValue[] getMatchParameters()
          Returns a set of DescribedValue objects whose values can be modified to modify the matching criteria.
 ValueInfo[] getTupleInfos()
          Returns a set of ValueInfo objects indicating what is required for the elements of each tuple.
 double matchScore(Object[] tuple1, Object[] tuple2)
          Indicates whether two tuples count as matching each other, and if so how closely.
 

Field Detail

NO_BINS

public static final Object[] NO_BINS
Convenience constant - it's a zero-length array of objects, suitable for returning from getBins(java.lang.Object[]) if no match can result.

Method Detail

getBins

public Object[] getBins(Object[] tuple)
Returns a set of keys for bins into which possible matches for a given tuple might fall. The returned objects can be anything, but should have their equals and hashCode methods implemented properly for comparison.

Parameters:
tuple - tuple
Returns:
set of bin keys which might be returned by invoking this method on other tuples which count as matches for the submitted tuple

matchScore

public double matchScore(Object[] tuple1,
                         Object[] tuple2)
Indicates whether two tuples count as matching each other, and if so how closely. If tuple1 and tuple2 are considered as a matching pair, then a non-negative value should be returned indicating how close the match is - the higher the number the worse the match, and a return value of zero indicates a 'perfect' match. If the two tuples do not consitute a matching pair, then a negative number (conventionally -1.0) should be returned. This return value can be thought of as (and will often correspond physically with) the distance in some real or notional space between the points represented by the two submitted tuples.

If there's no reason to do otherwise, the range 0..1 is recommended for successul matches. However, if the result has some sort of physical meaning (such as a distance in real space) that may be used instead.

Parameters:
tuple1 - one tuple
tuple2 - the other tuple
Returns:
'distance' between tuple1 and tuple2; 0 is a perfect match, larger values indicate worse matches, negative values indicate no match

getTupleInfos

public ValueInfo[] getTupleInfos()
Returns a set of ValueInfo objects indicating what is required for the elements of each tuple. The length of this array is the number of elements in the tuple. Each element should at least have a defined name and content class. The info's nullable attribute has a special meaning: if true it means that it makes sense for this element of the tuple to be always blank (for instance assigned to no column).

Returns:
array of objects describing the requirements on each element of the tuples used for matching

getMatchParameters

public DescribedValue[] getMatchParameters()
Returns a set of DescribedValue objects whose values can be modified to modify the matching criteria. Typically at least one of these will be some sort of tolerance separation which determines how close tuples must be to count as a match. This match engine's behaviour can be modified by calling DescribedValue.setValue(java.lang.Object) on the returned objects.

Returns:
array of described values which influence the match

getMatchBounds

public Comparable[][] getMatchBounds(Comparable[] minTuple,
                                     Comparable[] maxTuple)
Given a range of tuple values, returns a range outside which no match to anything within that range can result. If the tuples on which this engine works represent some kind of space, the input values and output values specify a hyper-rectangular region of this space. In the common case in which the match criteria are based on proximity in this space up to a certain error, this method should return a rectangle which is like the input one but broadened in each direction by an amount corresponding to the error.

Both the input and output rectangles are specified by tuples representing its opposite corners; equivalently, they are the minimum and maximum values of each tuple element. In either the input or output min/max tuples, any element may be null to indicate that no information is available on the bounds of that tuple element (coordinate).

This method can be used by match algorithms which know in advance the range of coordinates they will match against and wish to reduce workload by not attempting matches which are bound to fail.

For example, a 1-d Cartesian match engine with an isotropic match error 0.5 would turn input values of ((0,200),(10,210)) into output values ((-0.5,199.5),(10.5,210.5)).

This method will only be called if canBoundMatch() returns true. Thus engines that cannot provide any useful information along these lines (for instance because none of its tuple elements is Comparable do not need to implement it in a meaningful way.

Parameters:
minTuple - tuple consisting of the minimum values of each tuple element in a possible match (to put it another way - coordinates of one corner of a tuple-space rectangle containing such a match)
maxTuple - tuple consisting of the maximum values of each tuple element in a possible match (to put it another way - coordinates of the other corner of a tuple-space rectangle containing such a match)
Returns:
2-element array of tuples - effectively (minTuple,maxTuple) broadened by errors
See Also:
canBoundMatch()

canBoundMatch

public boolean canBoundMatch()
Indicates that the getMatchBounds(java.lang.Comparable[], java.lang.Comparable[]) method can be invoked to provide some sort of useful result.

Returns:
true iff getMatchBounds may provide useful information

Copyright © 2004 CLRC: Central Laboratory of the Research Councils. All rights reserved.