public class MatchStarTables extends Object
This class originally contained only static methods.
Currently some methods are static and some are instance methods;
those which use a ProgressIndicator
or
SplitProcessor
are instance methods
which use the values set up at construction time.
The methods in this class operate on
Collection<RowLink>
s
rather than on LinkSet
s, to emphasise that they do not
modify the contents of the collections.
Such collections will typically be sorted into their natural sequence,
see orderLinks(uk.ac.starlink.table.join.LinkSet)
.
Modifier and Type | Field and Description |
---|---|
static ValueInfo |
GRP_ID_INFO
Defines the characteristics of a table column which represents the
ID of a group of matched row objects.
|
static ValueInfo |
GRP_SIZE_INFO
Defines the characteristics of a table column which represents the
number of matched row objects in a given group (with the same group ID).
|
Constructor and Description |
---|
MatchStarTables()
Constructs a MatchStarTables with default characteristics.
|
MatchStarTables(ProgressIndicator indicator,
SplitProcessor<?> splitProcessor)
Constructs a MatchStarTables with configuration.
|
Modifier and Type | Method and Description |
---|---|
static MatchStarTables |
createInstance(ProgressIndicator indicator,
RowRunner rowRunner)
Creates a MatchStarTables instance based on given optional
progress indicator and row runner.
|
Map<RowLink,LinkGroup> |
findGroups(Collection<RowLink> links)
|
static StarTable |
makeInternalMatchTable(int iTable,
Collection<RowLink> rowLinks,
long rowCount)
Analyses a set of RowLinks to mark as linked rows of a given table.
|
StarTable |
makeJoinTable(StarTable[] tables,
Collection<RowLink> rowLinks,
boolean addGroups,
JoinFixAction[] fixActs,
ValueInfo matchScoreInfo)
Constructs a table made out of a set of constituent tables
joined together according to a set of RowLinks describing
row matches.
|
static StarTable |
makeSequentialJoinTable(StarTable[] tables,
Collection<RowLink> rowLinks,
JoinFixAction[] fixActs,
ValueInfo matchScoreInfo)
Constructs a non-random table made out of a set of possibly non-random
constituent tables joined together according to a RowLink collection.
|
static Collection<RowLink> |
orderLinks(LinkSet linkSet)
Best-efforts Conversion of a LinkSet, which is what RowMatcher outputs,
to a Collection of RowLinks, which is what's used by this class.
|
public static final ValueInfo GRP_ID_INFO
public static final ValueInfo GRP_SIZE_INFO
public MatchStarTables()
public MatchStarTables(ProgressIndicator indicator, SplitProcessor<?> splitProcessor)
The splitProcessor argument allows to configure how potentially parallel processing is done.
indicator
- progress indicator, or null for no loggingsplitProcessor
- parallel processing implementation,
or null for default behaviourpublic StarTable makeJoinTable(StarTable[] tables, Collection<RowLink> rowLinks, boolean addGroups, JoinFixAction[] fixActs, ValueInfo matchScoreInfo) throws InterruptedException
RowLink
entry in a set rowLinks
; if that RowLink
contains a row from one of the tables being joined here,
the columns corresponding to that table are filled in.
If it contains multiple rows from that table, an arbitrary one
of them is filled in.
The tables
array determines which tables columns appear
in the output table. It must have (at least) as many elements
as the highest table index in the RowLink set. Table data
will be picked from the n'th table in this array for RowRef
elements with a tableIndex of n. If the nth
element is null, the corresponding columns will not appear in
the output table.
The matchScoreInfo
parameter is optional.
If it is non-null, then an additional column, described by
matchScoreInfo
, will be added to the table containing
the score
values from the RowLink
s in
links
. The content class of matchScoreInfo
should be Number
or one of its subclasses.
tables
- array of constituent tablesrowLinks
- set of RowLink objects which define which rows
in one table are associated with which rows in the othersaddGroups
- flag which indicates whether the output table
should, if appropriate, include GRP_ID_INFO
and
GRP_SIZE_INFO
columnsfixActs
- actions to take for deduplicating column names
(array of the same length as tables
)matchScoreInfo
- may supply information about the meaning
of the link scoresInterruptedException
public static StarTable makeSequentialJoinTable(StarTable[] tables, Collection<RowLink> rowLinks, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
tables
- array of constituent tablesrowLinks
- link set defining the matchfixActs
- actions to take for deduplicating column names
(array of the same size as tables
)matchScoreInfo
- may suply information about the meaning of
the match scores, if presentpublic static StarTable makeInternalMatchTable(int iTable, Collection<RowLink> rowLinks, long rowCount)
GRP_ID_INFO
and GRP_SIZE_INFO
.
Rows of the table linked together
by rowLinks
are assigned the same integer value in
the new GRP_ID_INFO column, and the GRP_SIZE_INFO column
indicates how many rows are linked together in this way.
Each group corresponds to a single RowLink; if a row is part of
more than one RowLink then only one of them will be recorded
in the new columns.
Any rows linked in rowLinks
which do not refer to
table
have null entries in these columns.iTable
- the index of the table in which internal matches
are to be soughtrowLinks
- a collection of RowLink
objects
linking groups of rows togetherrowCount
- number of rows in the returned table
(must be large enough
to accommodate the indices in rowLinks
)public Map<RowLink,LinkGroup> findGroups(Collection<RowLink> links) throws InterruptedException
RowLink
s to LinkGroup
s
which describes connected groups of links in the input collection.
A related group is one in which the RowRefs of its constituent
RowLinks form a connected graph in which RowRefs are the nodes
and RowLinks are the edges.
A LinkGroup with a link count of more than one therefore
represents an ambiguous match, that is one in which one or more
of its RowRefs is contained in more than one RowLink in the
original RowLink collection.
The returned map contains entries only for non-trivial LinkGroups, that is ones which contain more than one link.
links
- link set representing a set of matcheslinks
InterruptedException
public static Collection<RowLink> orderLinks(LinkSet linkSet)
LinkSet.toSorted()
, but in case
that fails for lack of memory (not that likely, but could happen)
it will write a message through the logging system and
return a value giving an unordered result instead.linkSet
- unordered LinkSetpublic static MatchStarTables createInstance(ProgressIndicator indicator, RowRunner rowRunner)
indicator
- progress indicator, or null for no loggingrowRunner
- parallel processing implementation,
or null for default behaviourCopyright © 2024 Central Laboratory of the Research Councils. All Rights Reserved.