public interface MatchKit
This interface consists of two methods. One tests whether two tuples count as matching or not, and assigns a closeness score if they are (in practice, this is likely to compare corresponding elements of the two submitted tuples allowing for some error in each one). The second is a bit more subtle: it must identify a set of bins into which possible matches for the tuple might fall. For the case of coordinate matching with errors, you would need to chop the whole possible space into a discrete set of zones, each with a given key, and return the key for each zone near enough to the submitted tuple (point) that it might contain a match for it.
Formally, the requirements for correct implementations of this interface are as follows:
matchScore(t1,t2)
== matchScore(t2,t1)
matchScore(t1,t2)>=0
implies a non-zero intersection of
getBins(t1)
and getBins(t2)
getBins(t1)
and getBins(t2)
is as small as possible for non-matching t1
and t2
(preferably 0)
getBins
is as small as
possible (preferably 1)
It may help to think of all this as a sort of fuzzy hash.
Instances of this class are not thread-safe, and should not be used from multiple threads concurrently.
Modifier and Type | Field and Description |
---|---|
static Object[] |
NO_BINS
Convenience constant - it's a zero-length array of objects, suitable
for returning from
getBins(java.lang.Object[]) if no match can result. |
Modifier and Type | Method and Description |
---|---|
Object[] |
getBins(Object[] tuple)
Returns a set of keys for bins into which possible matches for
a given tuple might fall.
|
double |
matchScore(Object[] tuple1,
Object[] tuple2)
Indicates whether two tuples count as matching each other, and if
so how closely.
|
static final Object[] NO_BINS
getBins(java.lang.Object[])
if no match can result.Object[] getBins(Object[] tuple)
equals
and hashCode
methods implemented
properly for comparison.tuple
- tupletuple
double matchScore(Object[] tuple1, Object[] tuple2)
tuple1
and tuple2
are
considered as a matching pair, then a non-negative value should
be returned indicating how close the match is - the higher the
number the worse the match, and a return value of zero indicates
a 'perfect' match.
If the two tuples do not consitute a matching pair, then
a negative number (conventionally -1.0) should be returned.
This return value can be thought of as (and will often
correspond physically with) the distance in some real or notional
space between the points represented by the two submitted tuples.
If there's no reason to do otherwise, the range 0..1 is recommended for successul matches. However, if the result has some sort of physical meaning (such as a distance in real space) that may be used instead.
tuple1
- one tupletuple2
- the other tupletuple1
and tuple2
;
0 is a perfect match, larger values indicate worse matches,
negative values indicate no matchCopyright © 2025 Central Laboratory of the Research Councils. All Rights Reserved.