votlint: Validates VOTable documents
The VOTable standard, while not hugely complicated, has a number of subtleties and it's not difficult to produce VOTable documents which violate it in various ways. In fact it's probably true to say that most VOTable documents out there are not strictly legal. In some cases the errors are small and a parser is likely to process the document without noticing the trouble. In other cases, the errors are so serious that it's hard for any software to make sense of it. In many cases in between, different software will react in different ways, in the worst case appearing to parse a VOTable but in fact understanding the wrong data.
votlint is a program which can check a VOTable document
and spot places where it does not conform to the VOTable standard,
or places which look like they may not mean what the author intended.
It is meant for use in two main scenarios:
Validating a VOTable document against the VOTable schema or DTD
of course goes a long way towards checking a VOTable document for errors
(though it's clear that many VOTable authors don't even go this far),
but it by no means does the whole job, simply because the schema/DTD
specification languages don't have the facilities
to understand the data structure
of a VOTable document. For instance the VOTable schema
will allow any plain text content in a
TD element, but whether
this makes sense in a VOTable depends on the
attribute of the corresponding
FIELD element. There are many
votlint tackles this by parsing the VOTable document
in a way which understands its structure and assessing the content
as critically as it can. For any incorrect or questionable content
it finds, it will output a short message describing the problem
and giving its location in the document. What you do with this
information is then up to you.
votlint is very straightforward.
gives the location (filename or URL) of a VOTable document.
Otherwise, the document will be read from standard input.
Error and warning messages will be written on standard error.
Each message is prefixed with the location at which the error was
found (if possible the line and column are shown, though this is
dependent on your JVM's default XML parser).
The processing is SAX-based, so arbitrarily long tables can
be processed without heavy memory use.
votlint can't guarantee to pick up every possible
error in a VOTable document, but it ought to pick up many of the
most serious errors that are commonly made in authoring VOTables.
votlint's handling of XML namespaces
seems to be somewhat dependent on the XML parser in use.
As far as I can see, Crimson (the default in many JREs) works for any
namespace arrangements, but Xerces seems to have problems when validating
documents which use namespace prefixes. Not sure about other parsers.
This probably won't cause you trouble, but if it does you may need to
validate=false to work around it.
Contact the author if this seems to be a serious issue for you.