|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface MatchEngine
Defines the details of object matching criteria.
This interface provides methods for ascertaining whether two table
rows are to be linked - this usually means that they are to be
assumed to refer to the same object.
The methods act on 'tuples' - an array of objects defining the relevant
characteristics of a row. Of course these tuples have to be prepared
with understanding of what a particular implementation of this interface
knows how to deal with, which can be obtained from the getTupleInfos()
method. Typically a tuple will be a list of coordinates,
such as RA and Dec.
The business end of the interface consists of two methods. One tests whether two tuples count as matching or not, and assigns a closeness score if they are (in practice, this is likely to compare corresponding elements of the two submitted tuples allowing for some error in each one). The second is a bit more subtle: it must identify a set of bins into which possible matches for the tuple might fall. For the case of coordinate matching with errors, you would need to chop the whole possible space into a discrete set of zones, each with a given key, and return the key for each zone near enough to the submitted tuple (point) that it might contain a match for it.
Formally, the requirements for correct implementations of this interface are as follows:
It may help to think of all this as a sort of fuzzy hash.
Field Summary | |
---|---|
static java.lang.Object[] |
NO_BINS
Convenience constant - it's a zero-length array of objects, suitable for returning from getBins(java.lang.Object[]) if no match can result. |
Method Summary | |
---|---|
boolean |
canBoundMatch()
Indicates that the getMatchBounds(uk.ac.starlink.table.join.NdRange[], int) method can be invoked
to provide some sort of useful result. |
java.lang.Object[] |
getBins(java.lang.Object[] tuple)
Returns a set of keys for bins into which possible matches for a given tuple might fall. |
NdRange |
getMatchBounds(NdRange[] inRanges,
int index)
Given a range of tuple values, returns a range outside which no match to anything within that range can result. |
DescribedValue[] |
getMatchParameters()
Returns a set of DescribedValue objects whose values can be modified to modify the matching criteria. |
ValueInfo |
getMatchScoreInfo()
Returns a description of the value returned by the matchScore(java.lang.Object[], java.lang.Object[]) method. |
double |
getScoreScale()
Returns a scale value for the match score. |
DescribedValue[] |
getTuningParameters()
Returns a set of DescribedValue objects whose values can be modified to tune the performance of the match. |
ValueInfo[] |
getTupleInfos()
Returns a set of ValueInfo objects indicating what is required for the elements of each tuple. |
double |
matchScore(java.lang.Object[] tuple1,
java.lang.Object[] tuple2)
Indicates whether two tuples count as matching each other, and if so how closely. |
Field Detail |
---|
static final java.lang.Object[] NO_BINS
getBins(java.lang.Object[])
if no match can result.
Method Detail |
---|
java.lang.Object[] getBins(java.lang.Object[] tuple)
tuple
- tuple
double matchScore(java.lang.Object[] tuple1, java.lang.Object[] tuple2)
If there's no reason to do otherwise, the range 0..1 is recommended for successul matches. However, if the result has some sort of physical meaning (such as a distance in real space) that may be used instead.
tuple1
- one tupletuple2
- the other tuple
ValueInfo getMatchScoreInfo()
matchScore(java.lang.Object[], java.lang.Object[])
method. The content class should be numeric
(though need not be Double
), and the name,
description and units should be descriptive of whatever the
physical significance of the value is.
If the result of matchScore
is not interesting
(for instance, if it's always either 0 or -1),
null
may be returned.
double getScoreScale()
matchScore
/getScoreScale()
is of order unity, and is thus comparable between
different match engines.
As a general rule, the result should be the maximum value ever
returned from the matchScore
method,
corresponding to the least good successful match.
For binary MatchEngine implementations
(all matches are either score=0 or failures)
a value of 1 is recommended.
If nothing reliable can be said about the scale, NaN may be returned.
ValueInfo[] getTupleInfos()
DescribedValue[] getMatchParameters()
DescribedValue.setValue(java.lang.Object)
on the
returned objects.
DescribedValue[] getTuningParameters()
DescribedValue.setValue(java.lang.Object)
on the
returned objects.
Changing these values will make no difference to the output of
matchScore(java.lang.Object[], java.lang.Object[])
, but may change the output of getBins(java.lang.Object[])
.
This may change the CPU and memory requirements of the match,
but will not change the result. The default value should be
something sensible, so that setting the value of these parameters
is not in general required.
NdRange getMatchBounds(NdRange[] inRanges, int index)
Both the input and output rectangles are specified by tuples representing its opposite corners; equivalently, they are the minimum and maximum values of each tuple element. In either the input or output min/max tuples, any element may be null to indicate that no information is available on the bounds of that tuple element (coordinate).
An array of n-dimensional ranges is given, though only one of them
(specified by the index
value) forms the basis for
the output range. The other ranges in the input array may in some
cases be needed as context in order to do the calculation.
If the match error is fixed, only the single input n-d range is needed
to work out the single output range. However, if the errors are
obtained by looking at the tuples themselves (match errors are per-row)
then in general the broadening has to be done using the maximum
error of any of the tables involved in the match,
not just the one to be broadened.
For a long time, I didn't realise this, so versions of this software
up to STIL v3.0-14 (Oct 2015) were not correctly broadening these
ranges, leading to potentially missed associations near the edge
of bounded regions.
This method can be used by match algorithms which know in advance the range of coordinates they will match against and wish to reduce workload by not attempting matches which are bound to fail.
For example, a 1-d Cartesian match engine with an isotropic match error 0.5 would turn input values of ((0,200),(10,210)) into output values ((-0.5,199.5),(10.5,210.5)).
This method will only be called if canBoundMatch()
returns true. Thus engines that cannot provide any useful
information along these lines (for instance because none of its
tuple elements is Comparable
) do not need to
implement it in a meaningful way.
inRanges
- array of input ranges for the tables on which
the match will take place;
each element bounds the values for each tuple
element in its corresponding table
in a possible match
(to put it another way - each element gives the
coordinates of the opposite corners of a tuple-space
rectangle covered by one input table)index
- which element of the inRanges
array
for which the broadened output value is required
inRanges[index]
broadened by errorscanBoundMatch()
boolean canBoundMatch()
getMatchBounds(uk.ac.starlink.table.join.NdRange[], int)
method can be invoked
to provide some sort of useful result.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |