uk.ac.starlink.table.join
Class MatchStarTables

java.lang.Object
  extended byuk.ac.starlink.table.join.MatchStarTables

public class MatchStarTables
extends Object

Provides factory methods for producing tables which represent the result of row matching.


Field Summary
static ValueInfo GRP_ID_INFO
          Defines the characteristics of a table column which represents the ID of a group of matched row objects.
static ValueInfo GRP_SIZE_INFO
          Defines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).
 
Constructor Summary
MatchStarTables()
           
 
Method Summary
static Map findGroups(LinkSet links)
          Returns a mapping from RowLinks to LinkGroups which describes connected groups of links in the input LinkSet.
static StarTable makeInternalMatchTable(int iTable, LinkSet rowLinks, long rowCount)
          Analyses a set of RowLinks to mark as linked rows of a given table.
static StarTable makeJoinTable(StarTable[] tables, LinkSet rowLinks, boolean addGroups, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
          Constructs a table made out of a set of constituent tables joined together according to a LinkSet describing row matches.
static StarTable makeJoinTable(StarTable table1, StarTable table2, LinkSet pairs, JoinType joinType, boolean addGroups, JoinFixAction[] fixActs, ValueInfo matchScoreInfo)
          Constructs a table made out of two constituent tables joined together according to a LinkSet describing row matches and a flag determining what conditions on a RowLink give you an output row.
static StarTable makeParallelMatchTable(StarTable table, int iTable, LinkSet links, int width, int minSize, int maxSize, JoinFixAction[] fixActs)
          Constructs a new wide table from a single given base table and a set of RowLinks.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

GRP_ID_INFO

public static final ValueInfo GRP_ID_INFO
Defines the characteristics of a table column which represents the ID of a group of matched row objects.


GRP_SIZE_INFO

public static final ValueInfo GRP_SIZE_INFO
Defines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).

Constructor Detail

MatchStarTables

public MatchStarTables()
Method Detail

makeJoinTable

public static StarTable makeJoinTable(StarTable table1,
                                      StarTable table2,
                                      LinkSet pairs,
                                      JoinType joinType,
                                      boolean addGroups,
                                      JoinFixAction[] fixActs,
                                      ValueInfo matchScoreInfo)
Constructs a table made out of two constituent tables joined together according to a LinkSet describing row matches and a flag determining what conditions on a RowLink give you an output row. The columns of the resulting table are made by appending the columns of the constituent tables side by side.

The tables array determines which tables columns appear in the output table. It must have (at least) as many elements as the highest table index in the RowLink set. Table data will be picked from the n'th table in this array for RowRef elements with a tableIndex of n. If the nth element is null, the corresponding columns will not appear in the output table.

The matchScoreInfo parameter is optional. If it is non-null, then an additional column, described by matchScoreInfo, will be added to the table containing the score values from any RowLink2s in links. The content class of matchScoreInfo should be Number or one of its subclasses.

This is a convenience method which calls the other makeJoinTable method.

Parameters:
table1 - first input table
table2 - second input table
pairs - set of links each representing a matched pair of rows between table1 and table2. Contents of this set may be modified by this routine
joinType - describes how the input list of matched pairs is used to generate an output sequence of rows
addGroups - flag which indicates whether the output table should, if appropriate, include GRP_ID_INFO and GRP_SIZE_INFO columns
fixActs - actions to take for deduplicating column names (array of the same length as tables)
matchScoreInfo - may supply information about the meaning of the match scores
Returns:
table representing the join

makeJoinTable

public static StarTable makeJoinTable(StarTable[] tables,
                                      LinkSet rowLinks,
                                      boolean addGroups,
                                      JoinFixAction[] fixActs,
                                      ValueInfo matchScoreInfo)
Constructs a table made out of a set of constituent tables joined together according to a LinkSet describing row matches. The columns of the resulting table are made by appending the columns of the constituent tables side by side. Each row in the resulting table corresponds to one RowLink entry in a set rowLinks; if that RowLink contains a row from one of the tables being joined here, the columns corresponding to that table are filled in. If it contains multiple rows from that table, an arbitrary one of them is filled in.

The tables array determines which tables columns appear in the output table. It must have (at least) as many elements as the highest table index in the RowLink set. Table data will be picked from the n'th table in this array for RowRef elements with a tableIndex of n. If the nth element is null, the corresponding columns will not appear in the output table.

The matchScoreInfo parameter is optional. If it is non-null, then an additional column, described by matchScoreInfo, will be added to the table containing the score values from the RowLinks in links. The content class of matchScoreInfo should be Number or one of its subclasses.

Parameters:
tables - array of constituent tables
rowLinks - set of RowLink objects which define which rows in one table are associated with which rows in the others
addGroups - flag which indicates whether the output table should, if appropriate, include GRP_ID_INFO and GRP_SIZE_INFO columns
fixActs - actions to take for deduplicating column names (array of the same length as tables)
matchScoreInfo - may supply information about the meaning of the link scores

makeInternalMatchTable

public static StarTable makeInternalMatchTable(int iTable,
                                               LinkSet rowLinks,
                                               long rowCount)
Analyses a set of RowLinks to mark as linked rows of a given table. The result of this method is a two-column table whose rows correspond one-to-one with the rows of the table referenced in the link set. The output columns are defined by the constants GRP_ID_INFO and GRP_SIZE_INFO. Rows of the table linked together by rowLinks are assigned the same integer value in the new GRP_ID_INFO column, and the GRP_SIZE_INFO column indicates how many rows are linked together in this way. Each group corresponds to a single RowLink; if a row is part of more than one RowLink then only one of them will be recorded in the new columns. Any rows linked in rowLinks which do not refer to table have null entries in these columns.

Parameters:
iTable - the index of the table in which internal matches are to be sought
rowLinks - a collection of RowLink objects linking groups of rows together
rowCount - number of rows in the returned table (must be large enough to accommodate the indices in rowLinks)
Returns:
a new two-column table with a one-to-one row correspondance with the table describing internal row matches

makeParallelMatchTable

public static StarTable makeParallelMatchTable(StarTable table,
                                               int iTable,
                                               LinkSet links,
                                               int width,
                                               int minSize,
                                               int maxSize,
                                               JoinFixAction[] fixActs)
Constructs a new wide table from a single given base table and a set of RowLinks. The resulting table consists of a number of sections of the original table placed side by side, so it has width times the number of columns that table does. Each row is constructed from one or more rows of the original table; each output row corresponds to a single RowLink. Only row links which have at least minSize entries and no more than maxSize entries are converted into output rows; if there are more entries than the width of the table the extras are just discarded. Any row references in a RowLink not corresponding to table index iTable are ignored.

Parameters:
table - input table
iTable - index corresponding to this table in the rowLinks set
links - collection of RowLink objects describing the matches. This collection is modified on exit
width - width of the output table as a multiple of the width of the input table
minSize - minimum number of entries in a RowLink to count as an output row
maxSize - maximum number of entries in a RowLink to count as an output row; also the width of the output table (as a multiple of the width of the input table)
fixActs - actions to take for deduplicating column names (width-element array, or null)

findGroups

public static Map findGroups(LinkSet links)
Returns a mapping from RowLinks to LinkGroups which describes connected groups of links in the input LinkSet. A related group is one in which the RowRefs of its constituent RowLinks form a connected graph in which RowRefs are the nodes and RowLinks are the edges. A LinkGroup with a link count of more than one therefore represents an ambiguous match, that is one in which one or more of its RowRefs is contained in more than one RowLink in the original LinkSet.

The returned map contains entries only for non-trivial LinkGroups, that is ones which contain more than one link.

Parameters:
links - link set representing a set of matches
Returns:
RowLink -> LinkGroup mapping describing connected groups in links

Copyright © 2004 CLRC: Central Laboratory of the Research Councils. All rights reserved.