next up previous 62
Next: Additional utilities
Up: Internal workings
Previous: Operation of the indexer program


Operation of the extractor program

The extractor program scb functions in two main modes; as a normal program which can be invoked from the Unix shell and outputs plain text (described in section 4.1), and as a CGI program which is run under the control of an HTTP server to produce (predominantly) HTML output (described in 4.2).

In command line mode, the main function of the extractor program is to locate and output the file or routine requested using whatever arguments it has been given (see section 4.1), which may require pulling it out from within tar archives. If no such file can be found, an error message is written.

In CGI mode, the program has two additional responsibilities. Firstly, if invoked without sufficient arguments to identify a file for output, it must present some useful alternative in HTML. Secondly, when presenting source code, it must mark up appropriate words as HTML anchors.

When supplied with insufficient arguments it may simply present a formatted (and hopefully informative) error message. More commonly however, for instance if invoked with no arguments at all, it will present a form and/or a series of links enabling the user to re-invoke the program in such a way as to present the source file required, or get closer to being able to do that.

Marking the source code up with HTML anchor tags is done partly by the extractor program itself, and partly by separate language-specific tagging routines. Having extracted the source code, the extractor checks whether a suitable tagging routine exists. If not, then some basic markup is done and the file written more or less raw. If such a routine does exist however, it is called, which adds HTML-like tags indicating the positions of routine definitions and code references. The extractor program goes through this, and converts it to actual HTML before writing it out. The following example should clarify this. If the original code is:

#include "header.h"
int code (int argc, char **argv) {
   do_stuff();
}
it will be changed by the tagging routine to read:
#include "<a href='INCLUDE-header.h'>header.h</a>"
int <a name='code'>code</a> (int argc, char **argv) {
   <a href='do_stuff'>do_stuff</a>();
}
which, assuming the code appears within a package named ``pack'', will finally be modified for output by the extractor to look something like:
#include "<a href='scb.pl?header.h&package=pack&type=file'>header.h</a>"
int <a name='code'>code</a> (int argc, char **argv) {
   <a href='scb.pl?do_stuff&package=pack&type=func#do_stuff'>do_stuff</a>();
}
The hyperlinks written by the browser program are thus such (see section 4.1) that they effectively point to a routine or file of the given name from the current package if one exists, but if it does not, will resolve to one from another package. This can sometimes resolve to the wrong routine--for instance if there are multiple routines of the same name inside the package, or none inside but one in each of several external packages. However, it has a good chance of resolving to the right routine, especially for routines named in the usual Starlink way of pre_name, and importantly it has no need to attempt to understand the contents of include files, link scripts or makefiles, which would make the whole business a great deal more complicated.

Note that the link to the function (although not to the include file) includes specification of a fragment within the document (the part after the `#' symbol), which is important if the referenced function is just one of many within a large source file (frequently true in C source, although less so in Fortran). Links to locations within a file work because of the <a name='...'> tags at function definitions.

A couple of other subtleties are observed in creating these hyperlinks. Firstly, if no file or routine of the given name exists for any package, the link is not created, but the word which would otherwise have been a link is output in bold to indicate that it looks like it ought to reference another file but does not. Secondly, if the referenced item exists in the same source file, a truncated URL giving only the fragment position in the same file is written. This prevents the behaviour (which smart browsers may in any case avoid) of having to reload the same file when all that is required is to move around in it.



next up previous 62
Next: Additional utilities
Up: Internal workings
Previous: Operation of the indexer program

SCB --- Source Code Browser
Starlink User Note 225
M. B. Taylor
10 December 1999
E-mail:ussc@star.rl.ac.uk