Index files

Next: Parsing and tagging source code
Up: Internal workings
Previous: Internal workings

Index files

The indexing program scbindex builds disk-based indexes, and the extractor program, which is the same as the browser program scb, interrogates them. There are three indexes:

Tasks: For each package, a list of all the tasks which have been identified is kept (the definition of what constitutes a task is rather loose, but it is intended to be programs which can be invoked by typing their names from the Unix shell, ICL, or other external environment). Only the name of each task is stored here; for each task there will be a corresponding entry giving its location within that package in the Routines index. This index also serves as a list of all the packages which have been indexed.
Files: An index of every file which makes up the indexed source code set, indexed by bare filename (i.e. the tail of the filename, excluding path information). This index also holds one special entry for each package recording where the source code is stored.
Routines: An index of every C and Fortran function and subroutine, indexed by the name used by the Unix linker on the supported systems; thus C functions are indexed by their function names as written in the source code, and Fortran routines by the function or subroutine name in lower case, followed by a single underscore. C preprocessor macro functions are also indexed, but Fortran statement functions are not. The location stored for each function gives only the file in which it can be found, not the position within that file.

The Tasks index is stored in a plain text file called tasks; being line formatted it has, in general, to be read in its entirety to find any required piece of information. Since it is plain text it can be examined using a normal text editor (although some of the lines may be rather long).

The File and Routine indexes each resemble a table which maps a key (name of the routine or file), and optionally a Starlink package name, to a location in the source tree. For each name which occurs at all, the index contains a list of one or more locations, one for each package in which it occurs. The index can be interrogated by requesting any location for a given name, or by requesting a location which is preferentially within a given package (using the `name' and `package' arguments of the CGI script respectively). An important upshot of this is that if there are two instances of the same file name or of the same function/routine name in the same package, only one will be indexed, so that the browser program will never access the other. The same routine name may crop up in different packages without causing clashes however.

The File and Routine indexes are handled by the program as StarIndex objects (named file and func respectively), i.e. are of a type defined in the supplied module StarIndex.pm. The corresponding files are much larger than the Tasks file (a couple of megabytes or more each), but because of the way they are implemented (as a hash of flattened lists tied to a DBM file of some sort), any given entry can be accessed by key very quickly. The design of these objects was dictated chiefly by the requirements of the browser program, as explained in section 5.4). More detailed documentation of the implementation of these indexes can be found in the StarIndex.pm module itself.

It is possible to examine the File and Routines indexes directly from the command line by using the supplied dbmcat utility (see section 6.1).

Next: Parsing and tagging source code
Up: Internal workings
Previous: Internal workings

SCB --- Source Code Browser
Starlink User Note 225
M. B. Taylor
10 December 1999
E-mail:ussc@star.rl.ac.uk