uk.ac.starlink.util
Class DataSource

java.lang.Object
  |
  +--uk.ac.starlink.util.DataSource
Direct Known Subclasses:
FileDataSource, ResourceDataSource, URLDataSource

public abstract class DataSource
extends Object

Represents a stream-like source of data. Instances of this class can be used to encapsulate the data available from a stream. The idea is that the stream should return the same sequence of bytes each time.

As well as the ability to return a stream, a DataSource may also have a position, which corresponds to the 'ref' or 'frag' part of a URL (the bit after the #). This is an indication of a location in the stream; it is a string, and its interpretation is entirely up to the application (though may be specified by the documentation of specific DataSource subclasses).

As well as providing the facility for several different objects to get their own copy of the underlying input stream, this class also handles decompression of the stream. Compression types are as understood by the associated Compression class.

For efficiency, a buffer of the bytes at the start of the stream called the 'intro buffer' is recorded the first time that the stream is read. This can then be used for magic number queries cheaply, without having to open a new input stream. In the case that the whole input stream is shorter than the intro buffer, the underlying input stream never has to be read again.

Any implementation which implements getRawInputStream() in such a way as to return different byte sequences on different occasions may lead to unpredictable behaviour from this class.

See Also:
Compression

Field Summary
static int DEFAULT_INTRO_LIMIT
           
 
Constructor Summary
DataSource()
          Constructs a DataSource with a default size of intro buffer.
DataSource(int introLimit)
          Constructs a DataSource with a given size of intro buffer.
 
Method Summary
 void close()
          Closes any open streams owned and not yet dispatched by this DataSource.
 DataSource forceCompression(Compression compress)
          Returns a DataSource representing the same underlying stream, but with a forced compression mode compress.
 Compression getCompression()
          Returns an object which will handle any required decompression for this stream.
 InputStream getHybridInputStream()
          Returns an input stream which appears just the same as the one returned by getInputStream(), but only incurs the expense of obtaining an actual input stream (by calling getRawInputStream() if more bytes are read than the cached magic number.
 InputStream getInputStream()
          Returns an InputStream containing the whole of this DataSource.
static InputStream getInputStream(String location)
          Returns an input stream based on the given location string.
 byte[] getIntro()
          Returns the intro buffer, first reading it if this hasn't been done before.
 int getIntroLimit()
          Returns the maximum length of the intro buffer.
 long getLength()
          Returns the length of the stream returned by getInputStream in bytes, if known.
 String getName()
          Returns a name for this source.
 String getPosition()
          Returns the position associated with this source.
protected abstract  InputStream getRawInputStream()
          Provides a new InputStream for this data source.
 long getRawLength()
          Returns the length in bytes of the stream returned by getRawInputStream, if known.
 String getSystemId()
          Returns a System ID for this DataSource; this is a string representation of a file name or URL, as used by Source and friends.
 URL getURL()
          Returns a URL which corresponds to this data source, if one exists.
static DataSource makeDataSource(String loc)
          Attempts to make a source given a name identifying its location.
static DataSource makeDataSource(URL url)
          Makes a source from a URL.
 void setCompression(Compression compress)
          Sets the compression to be associated with this data source.
 void setIntroLimit(int limit)
          Sets the maximum size of the intro buffer to a new value.
 void setName(String name)
          Sets the name of this source.
 void setPosition(String position)
          Sets the position associated with this source.
 String toString()
          Returns a short description of this source (name plus compression type).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_INTRO_LIMIT

public static final int DEFAULT_INTRO_LIMIT
See Also:
Constant Field Values
Constructor Detail

DataSource

public DataSource(int introLimit)
Constructs a DataSource with a given size of intro buffer.

Parameters:
introLimit - the maximum number of bytes in the intro buffer

DataSource

public DataSource()
Constructs a DataSource with a default size of intro buffer.

Method Detail

getRawInputStream

protected abstract InputStream getRawInputStream()
                                          throws IOException
Provides a new InputStream for this data source. This method should be implemented by subclasses to provide a new InputStream giving the raw content of the source each time it is called. The general contract of this method is that each time it is called it will return a stream with the same content.

Returns:
an InputStream containing the data of this source
IOException

getURL

public URL getURL()
Returns a URL which corresponds to this data source, if one exists. An URL.openConnection() method call on the URL returned by this method should provide a stream with the same content as the getRawInputStream() method of this data source. If no such URL exists or is known, then null should be returned.

If this source has a non-null position value, it will be appended to the main part of the URL after a '#' character (as the URL's ref part).

Returns:
a URL corresponding to this source, or null

getIntroLimit

public int getIntroLimit()
Returns the maximum length of the intro buffer.

Returns:
maximum length of the intro buffer

setIntroLimit

public void setIntroLimit(int limit)
Sets the maximum size of the intro buffer to a new value. Setting the intro limit to a new value will discard any state which this source has, so for reasons of efficiency it's not a good idea to call this method except immediately after the source has been constructed and before any reads have taken place.

Parameters:
limit - the new maximum length of the intro buffer

getRawLength

public long getRawLength()
Returns the length in bytes of the stream returned by getRawInputStream, if known. If the length is not known then -1 should be returned. The implementation of this method in DataSource returns -1; subclasses should override it if they can determine their length.

Returns:
the length of the raw input stream, or -1

getLength

public long getLength()
Returns the length of the stream returned by getInputStream in bytes, if known. A return value of -1 indicates that the length is unknown. The return value of this method may change from -1 to a positive value during the life of this object if it happens to work out how long it is.

Returns:
the length of the stream in bytes, or -1

getName

public String getName()
Returns a name for this source. This name is mainly intended as a label identifying the source for use in informational messages; it is not in general intended to be used to provide an absolute reference to the source. Thus, for instance, if the source references a file, its name might be a relative pathname or simple filename, rather than its absolute pathname. To identify the source absolutely, the getURL() method (or some suitable class-specific method) should be used. If this source has a position, it should probably form part of this name.

Returns:
a name

setName

public void setName(String name)
Sets the name of this source.

Parameters:
name - a name
See Also:
getName()

getPosition

public String getPosition()
Returns the position associated with this source. It is a string giving an indication of the part of the stream which is of interest. Its interpretation is up to the application.

Returns:
the position string, or null

setPosition

public void setPosition(String position)
Sets the position associated with this source. It is a string giving an indication of the part of the stream which is of interest. Its interpretation is up to the application.

Parameters:
position - the new posisition (may be null)

getSystemId

public String getSystemId()
Returns a System ID for this DataSource; this is a string representation of a file name or URL, as used by Source and friends. The return value may be null if none is known. This does not contain any reference to the position.

Returns:
the System ID string for this source, or null

getCompression

public Compression getCompression()
                           throws IOException
Returns an object which will handle any required decompression for this stream. A raw data stream is read and its magic number (first few bytes) matched against known patterns to determine if any known compression method is in use. If no known compression is being used, the value Compression.NONE is returned.

Returns:
a Compression object encoding this stream
IOException

getIntro

public byte[] getIntro()
                throws IOException
Returns the intro buffer, first reading it if this hasn't been done before. The intro buffer will contain the first few bytes of the decompressed stream. The number of bytes it contains (the size of the returned byte[] array) will be the smaller of introLimit and the length of the underlying uncompressed stream.

The returned buffer is the original not a copy - don't change its contents!

Returns:
the first few bytes of the uncompressed stream, up to a limit of introLimit
IOException

setCompression

public void setCompression(Compression compress)
Sets the compression to be associated with this data source. In general it will not be necessary or advisable to call this method, since this object will figure it out using magic numbers of the underlying stream. It can be used if the compression method is known, or to force use of a particular compression; in particular setCompression(Compression.NONE) can be used to force direct examination of the underlying stream without decompression, even if the underlying stream is in fact compressed.

The effects of setting a compression to a mode (other than NONE) which does not match the actual compression mode of the underlying stream are undefined, so this method should be used with care.

Parameters:
compress - the compression mode encoding the underlying stream

forceCompression

public DataSource forceCompression(Compression compress)
Returns a DataSource representing the same underlying stream, but with a forced compression mode compress. The returned DataSource object may be the same object as this one, but if it has a different compression mode from compress a new one will be created. As with setCompression(uk.ac.starlink.util.Compression), the consequences of using a different value of compress than the correct one (other than Compression.NONE are unpredictable.

Parameters:
compress - the compression mode to be used for the returned data source
Returns:
a data source with the same underlying stream as this, but a compression mode given by compress

getInputStream

public InputStream getInputStream()
                           throws IOException
Returns an InputStream containing the whole of this DataSource. If compression is detected in the underlying stream, it will be decompressed. The returned stream should be closed by the user when no longer required.

Returns:
an input stream that reads from the beginning of the underlying data source, decompressing it if appropriate
IOException

getHybridInputStream

public InputStream getHybridInputStream()
                                 throws IOException
Returns an input stream which appears just the same as the one returned by getInputStream(), but only incurs the expense of obtaining an actual input stream (by calling getRawInputStream() if more bytes are read than the cached magic number. This is an efficient way to read if you need an InputStream but may only end up reading the first few bytes of it.

Returns:
an input stream that reads from the beginning of the underlying data source, decompressing it if appropriate
IOException

close

public void close()
Closes any open streams owned and not yet dispatched by this DataSource. Should be called if this object is no longer required, or if it may not be required for some while. Calling this method does not prevent any other method being called on this object in the future. This method throws no checked exceptions; any IOException thrown during closing any owned streams are simply discarded.


toString

public String toString()
Returns a short description of this source (name plus compression type).

Overrides:
toString in class Object
Returns:
description of this DataSource

makeDataSource

public static DataSource makeDataSource(String loc)
                                 throws IOException
Attempts to make a source given a name identifying its location. Currently this must be either a file name or a URL. If an existing file or valid URL exists with the given name, a DataSource based on it will be returned. Otherwise an IOException will be thrown.

If a '#' character exists in the string, text after it will be interpreted as a position value. Otherwise, the position is considered to be null.

Parameters:
loc - the location of the data, with optional position
Returns:
a DataSource based on the data at name
Throws:
IOException - if name does not name an existing readable file or valid URL

makeDataSource

public static DataSource makeDataSource(URL url)
Makes a source from a URL. If url is a file-protocol URL referencing an existing file then a FileDataSource will be returned, otherwise it will be a URLDataSource. Under certain circumstances, it may be more efficient to use a FileDataSource than a URLDataSource, which is why this method may be worth using.

Parameters:
url - location of the data stream
Returns:
data source which returns the data at url

getInputStream

public static InputStream getInputStream(String location)
                                  throws IOException
Returns an input stream based on the given location string. The location may be a URL or filename or the string "-" to represent System.in, and may represent compressed or uncompressed data; the returned stream will be an uncompressed version.

Parameters:
location - URL, filename or "-"
Returns:
uncompressed stream containing the data at location
Throws:
FileNotFoundException - if location is not an existing file or a valid URL
IOException - if there is an error obtaining the stream

Copyright © 2004 CLRC: Central Laboratory of the Research Councils. All rights reserved.