votlint
: Validates VOTable documents
The VOTable standard, while not hugely complicated, has a number of subtleties and it's not difficult to produce VOTable documents which violate it in various ways. In some cases the errors are small and a parser is likely to process the document without noticing the trouble. In other cases, the errors are so serious that it's hard for any software to make sense of it. In many cases in between, different software will react in different ways, in the worst case appearing to parse a VOTable but in fact understanding the wrong data.
votlint
is a program which can check a VOTable document
and spot places where it does not conform to the VOTable standard,
or places which look like they may not mean what the author intended.
It is meant for use in two main scenarios:
Validating a VOTable document against the VOTable schema or DTD
of course goes a long way towards checking a VOTable document for errors,
but it by no means does the whole job, simply because the schema/DTD
specification languages don't have the facilities
to understand the data structure
of a VOTable document. For instance the VOTable schema
will allow any plain text content in a TD
element, but whether
this makes sense in a VOTable depends on the datatype
attribute of the corresponding FIELD
element. There are many
other examples.
votlint
tackles this by parsing the VOTable document
in a way which understands its structure and assessing the content
as critically as it can. For any incorrect or questionable content
it finds, it will output a short message describing the problem
and giving its location in the document. What you do with this
information is then up to you.
Using votlint
is very straightforward.
The votable
argument
gives the location (filename or URL) of a VOTable document.
Otherwise, the document will be read from standard input.
Error and warning messages will be written on standard output.
Each message is prefixed with the location at which the error was
found (if possible the line and column are shown, though this is
dependent on your JVM's default XML parser).
If multiple instances of the same problem are found,
by default only a few repeats of the message are reported;
this can be controlled with the maxrepeat
parameter.
The processing is SAX-based, so arbitrarily long tables can
be processed without heavy memory use.
votlint
can't guarantee to pick up every possible
error in a VOTable document, but it ought to pick up many of the
most serious errors that are commonly made in authoring VOTables.
Note: votlint
's handling of XML namespaces
seems to be somewhat dependent on the XML parser in use.
As far as I can see, Crimson (the default in many JREs) works for any
namespace arrangements, but Xerces seems to have problems when validating
documents which use namespace prefixes. Not sure about other parsers.
This probably won't cause you trouble, but if it does you may need to
set validate=false
to work around it.
Contact the author if this seems to be a serious issue for you.