public abstract class DataSource extends Object
As well as the ability to return a stream, a DataSource
may
also have a position
, which corresponds to the 'ref' or 'frag'
part of a URL (the bit after the #). This is an indication
of a location in the stream; it is a string, and its interpretation
is entirely up to the application (though may be specified by
the documentation of specific DataSource
subclasses).
As well as providing the facility for several different objects to
get their own copy of the underlying input stream, this class also
handles decompression of the stream.
Compression types are as understood by the associated Compression
class.
For efficiency, a buffer of the bytes at the start of the stream called the 'intro buffer' is recorded the first time that the stream is read. This can then be used for magic number queries cheaply, without having to open a new input stream. In the case that the whole input stream is shorter than the intro buffer, the underlying input stream never has to be read again.
Any implementation which implements getRawInputStream()
in such
a way as to return different byte sequences on different occasions
may lead to unpredictable behaviour from this class.
Compression
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_INTRO_LIMIT |
static String |
MARK_WORKAROUND_PROPERTY |
Constructor and Description |
---|
DataSource()
Constructs a DataSource with a default size of intro buffer.
|
DataSource(int introLimit)
Constructs a DataSource with a given size of intro buffer.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes any open streams owned and not yet dispatched by this
DataSource.
|
DataSource |
forceCompression(Compression compress)
Returns a DataSource representing the same underlying stream,
but with a forced compression mode
compress . |
Compression |
getCompression()
Returns an object which will handle any required decompression
for this stream.
|
InputStream |
getHybridInputStream()
Returns an input stream which appears just the same as the
one returned by
getInputStream() , but only incurs the
expense of obtaining an actual input stream (by calling
getRawInputStream() if more bytes are read than the
cached magic number. |
InputStream |
getInputStream()
Returns an InputStream containing the whole of this DataSource.
|
static InputStream |
getInputStream(String location,
boolean allowSystem)
Returns an input stream based on the given location string.
|
byte[] |
getIntro()
Returns the intro buffer, first reading it if this hasn't been
done before.
|
int |
getIntroLimit()
Returns the maximum length of the intro buffer.
|
long |
getLength()
Returns the length of the stream returned by
getInputStream
in bytes, if known. |
static boolean |
getMarkWorkaround()
Returns true if we are working around potential bugs in InputStream
InputStream.mark(int) /InputStream.reset()
methods (common, including in J2SE classes). |
String |
getName()
Returns a name for this source.
|
String |
getPosition()
Returns the position associated with this source.
|
protected abstract InputStream |
getRawInputStream()
Provides a new InputStream for this data source.
|
long |
getRawLength()
Returns the length in bytes of the stream returned by
getRawInputStream , if known. |
String |
getSystemId()
Returns a System ID for this DataSource; this is a string
representation of a file name or URL, as used by
Source and friends. |
URL |
getURL()
Returns a URL which corresponds to this data source, if one exists.
|
static DataSource |
makeDataSource(String loc)
Attempts to make a source given a string identifying its location
as a file, URL or system command output.
|
static DataSource |
makeDataSource(String loc,
boolean allowSystem)
Attempts to make a source given a string identifying its location
as a file, URL or optionally a system command output.
|
static DataSource |
makeDataSource(URL url)
Makes a source from a URL.
|
void |
setCompression(Compression compress)
Sets the compression to be associated with this data source.
|
void |
setIntroLimit(int limit)
Sets the maximum size of the intro buffer to a new value.
|
static void |
setMarkWorkaround(boolean workaround)
Sets whether we want to work around bugs in InputStream mark/reset
methods.
|
void |
setName(String name)
Sets the name of this source.
|
void |
setPosition(String position)
Sets the position associated with this source.
|
String |
toString()
Returns a short description of this source (name plus compression type).
|
public static final int DEFAULT_INTRO_LIMIT
public static final String MARK_WORKAROUND_PROPERTY
public DataSource(int introLimit)
introLimit
- the maximum number of bytes in the intro bufferpublic DataSource()
protected abstract InputStream getRawInputStream() throws IOException
IOException
public URL getURL()
URL.openConnection()
method call on the URL
returned by this method should provide a stream with the
same content as the getRawInputStream()
method of this
data source. If no such URL exists or is known, then null
should be returned.
If this source has a non-null position value, it will be appended to the main part of the URL after a '#' character (as the URL's ref part).
null
public int getIntroLimit()
public void setIntroLimit(int limit)
limit
- the new maximum length of the intro bufferpublic long getRawLength()
getRawInputStream
, if known. If the length is not known
then -1 should be returned.
The implementation of this method in DataSource
returns -1;
subclasses should override it if they can determine their length.public long getLength()
getInputStream
in bytes, if known.
A return value of -1 indicates that the length is unknown.
The return value of this method may change from -1 to a positive
value during the life of this object if it happens to work out
how long it is.public String getName()
getURL()
method
(or some suitable class-specific method) should be used.
If this source has a position, it should probably form part of
this name.public void setName(String name)
name
- a namegetName()
public String getPosition()
null
public void setPosition(String position)
position
- the new posisition (may be null
)public String getSystemId()
Source
and friends.
The return value may be null
if none is known.
This does not contain any reference to the position.null
public Compression getCompression() throws IOException
Compression.NONE
is returned.IOException
public byte[] getIntro() throws IOException
introLimit
and the length of the underlying uncompressed
stream.
The returned buffer is the original not a copy - don't change its contents!
introLimit
IOException
public void setCompression(Compression compress)
setCompression(Compression.NONE)
can
be used to force direct examination of the underlying stream
without decompression, even if the underlying stream is in fact
compressed.
The effects of setting a compression to a mode (other than NONE) which does not match the actual compression mode of the underlying stream are undefined, so this method should be used with care.
compress
- the compression mode encoding the underlying
streampublic DataSource forceCompression(Compression compress)
compress
.
The returned DataSource
object may be the same object
as this one, but
if it has a different compression mode from compress
a new one will be created. As with setCompression(uk.ac.starlink.util.Compression)
,
the consequences of using a different value of compress
than the correct one (other than Compression.NONE
are unpredictable.compress
- the compression mode to be used for the returned
data sourcecompress
public InputStream getInputStream() throws IOException
IOException
public InputStream getHybridInputStream() throws IOException
getInputStream()
, but only incurs the
expense of obtaining an actual input stream (by calling
getRawInputStream()
if more bytes are read than the
cached magic number. This is an efficient way to read if you
need an InputStream but may only end up reading the first
few bytes of it.IOException
public void close()
IOException
thrown during closing any owned streams are simply discarded.public String toString()
public static DataSource makeDataSource(String loc) throws IOException
If a '#' character exists in the string, text after it will be
interpreted as a position value. Otherwise, the position is
considered to be null
.
Note: this method presents a security risk if the
loc
string is vulnerable to injection.
Consider using the variant method
makeDataSource
(loc,false) in such cases.
This method just calls makeDataSource(loc,true)
.
loc
- the location of the data, with optional positionloc
IOException
- if loc
does not name
an existing readable file or valid URLpublic static DataSource makeDataSource(String loc, boolean allowSystem) throws IOException
The supplied loc
may be one of the following:
allowSystem=true
:
a string preceded by "<" or followed by "|",
giving a shell command line (may not work on all platforms)If a '#' character exists in the string, text after it will be
interpreted as a position value. Otherwise, the position is
considered to be null
.
Note: setting allowSystem=true
may
introduce a security risk if the loc
string is
vulnerable to injection.
loc
- the location of the data, with optional positionallowSystem
- whether to allow system commands
using the format aboveloc
IOException
- if loc
does not name
an existing readable file or valid URLpublic static DataSource makeDataSource(URL url)
url
is a file-protocol URL
referencing an existing file then
a FileDataSource
will be returned, otherwise it will be
a URLDataSource
. Under certain circumstances, it may
be more efficient to use a FileDataSource than a URLDataSource,
which is why this method may be worth using.url
- location of the data streamurl
public static InputStream getInputStream(String location, boolean allowSystem) throws IOException
allowSystem=true
:
a string preceded by "<" or followed by "|",
giving a shell command line (may not work on all platforms)Note: setting allowSystem=true
may
introduce a security risk if the loc
string is
vulnerable to injection.
location
- URL, filename, "cmdline|"/"<cmdline", or "-"allowSystem
- whether to allow system commands
using the format abovelocation
FileNotFoundException
- if location
cannot be
interpreted as a source of bytesIOException
- if there is an error obtaining the streampublic static boolean getMarkWorkaround()
InputStream.mark(int)
/InputStream.reset()
methods (common, including in J2SE classes).
The return value is dependent on the system property named
MARK_WORKAROUND_PROPERTY
.public static void setMarkWorkaround(boolean workaround)
workaround
- true to employ the workaroundCopyright © 2025 Central Laboratory of the Research Councils. All Rights Reserved.