public class ParquetTableWriter extends Object implements StarTableWriter, DocumentedIOHandler
Constructor and Description |
---|
ParquetTableWriter() |
Modifier and Type | Method and Description |
---|---|
boolean |
docIncludesExample()
Indicates whether the serialization of some (short) example table
should be added to the user documentation for this handler.
|
org.apache.parquet.hadoop.metadata.CompressionCodecName |
getCompressionCodec()
Returns the compression type used for data output.
|
String[] |
getExtensions()
Returns the list of filename extensions recognised by this handler.
|
String |
getFormatName()
Gives the name of the format which is written by this writer.
|
String |
getMimeType()
Returns a string suitable for use as the value of a MIME
Content-Type header.
|
VOTableVersion |
getVOTableVersion()
Returns the version of VOTable used to write metadata, if any.
|
String |
getXmlDescription()
Returns user-directed documentation in XML format.
|
Boolean |
isDictionaryEncoding()
Returns the dictionary encoding flag.
|
boolean |
isGroupArray()
Indicates how array-valued columns are written.
|
boolean |
isVOTableMetadata()
Returns the flag that indicates storing metadata in a dummy VOTable.
|
boolean |
looksLikeFile(String location)
Indicates whether the destination is of a familiar form for this
kind of writer.
|
void |
setCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codec)
Sets the compression type for data output.
|
void |
setDictionaryEncoding(Boolean useDict)
Sets the dictionary encoding flag.
|
void |
setGroupArray(boolean groupArray)
Configures how array-valued columns are written.
|
void |
setVOTableMetadata(boolean votMeta)
Sets the flag to indicate storing metadata in a dummy VOTable.
|
void |
setVOTableVersion(VOTableVersion votVersion)
Sets the version of VOTable used to write metadata, if any.
|
void |
writeStarTable(StarTable table,
OutputStream out)
Writes a
StarTable object to a given output stream. |
void |
writeStarTable(StarTable table,
String location,
StarTableOutput sto)
Writes a
StarTable object to a given location. |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
matchesExtension, readText, toLink
public String getFormatName()
StarTableWriter
getFormatName
in interface StarTableWriter
public String[] getExtensions()
DocumentedIOHandler
getExtensions
in interface DocumentedIOHandler
public boolean looksLikeFile(String location)
StarTableWriter
true
for values of location
which look like
the normal form for their output format, for instance one with
the usual file extension.looksLikeFile
in interface StarTableWriter
location
- the location name (probably filename)true
iff it looks like a file this writer would
normally writepublic String getMimeType()
StarTableWriter
application/octet-stream
"
(for binary formats) or "text/plain
" for ASCII ones)
is recommended.getMimeType
in interface StarTableWriter
public boolean docIncludesExample()
DocumentedIOHandler
Documented.getXmlDescription()
method already includes some example output, should return false.docIncludesExample
in interface DocumentedIOHandler
public String getXmlDescription()
Documented
The output should be a sequence of one or more <P> elements, using XHTML-like XML. Since rendering may be done in a number of contexts however, use of the full range of XHTML elements is discouraged. Where possible, the content should stick to simple markup such as the elements P, A, UL, OL, LI, DL, DT, DD EM, STRONG, I, B, CODE, TT, PRE.
getXmlDescription
in interface Documented
public void writeStarTable(StarTable table, String location, StarTableOutput sto) throws IOException
StarTableWriter
StarTable
object to a given location.
Implementations are free to interpret the location
argument
in any way appropriate for them. Typically however the location
will simply be used to get an output stream (for instance interpreting
it as a filename). In this case the sto
argument should
normally be used to turn location
into a stream.
StreamStarTableWriter
provides a suitable implementation
for this case.writeStarTable
in interface StarTableWriter
table
- table to writelocation
- destination for startab
sto
- StarTableOutput which dispatched this requestTableFormatException
- if startab
cannot be
written to location
IOException
- if there is some I/O errorpublic void writeStarTable(StarTable table, OutputStream out) throws IOException
StarTableWriter
StarTable
object to a given output stream.
The implementation can assume that out
is suitable for
direct writing (for instance it should not normally wrap it in a
BufferedOutputStream
), and should not close it
at the end of the call.
Not all table writers are capable of writing to a stream;
an implementation may throw a TableFormatException
to
indicate that it cannot do so.
writeStarTable
in interface StarTableWriter
table
- the table to writeout
- the output stream to which startab
should be
writtenTableFormatException
- if this table cannot be written to a
streamIOException
- if there is some I/O error@ConfigMethod(property="groupArray", usage="true|false", example="false", doc="<p>Controls the low-level detail of how array-valued columns\nare written.\nFor an array-valued int32 column named IVAL,\n<code>groupArray=false</code> will write it as\n\"<code>repeated int32 IVAL</code>\"\nwhile <code>groupArray=true</code> will write it as\n\"<code>optional group IVAL (LIST) {repeated group list\n{optional int32 element}}</code>\".\n</p><p>Although setting it <code>false</code> may be slightly more\nefficient, the default is <code>true</code>,\nsince if any of the columns have array values that either\nmay be null or may have elements which are null,\ngroupArray-style declarations for all columns are required\nby the <webref url=\'https://github.com/apache/parquet-format/blob/apache-parquet-format-2.10.0/LogicalTypes.md\'>Parquet file format</webref>:\n<blockquote><em>\n\"A repeated field that is neither contained by a LIST- or\nMAP-annotated group nor annotated by LIST or MAP should be\ninterpreted as a required list of required elements where\nthe element type is the type of the field.\nImplementations should use either LIST and MAP annotations\nor unannotated repeated fields, but not both. When using the\nannotations, no unannotated repeated types are allowed.\"\n</em></blockquote>\n</p><p>If this option is set false and an attempt is made to write\nnull arrays or arrays with null values, writing will fail.\n</p>") public void setGroupArray(boolean groupArray)
repeated
primitive,
if true, it's an optional group
containing a
repeated group
containing a optional
primitive.
True is the default, set it false with care, since that precludes null array values or array elements.
groupArray
- true for grouped arrays,
false for repeated primitivespublic boolean isGroupArray()
@ConfigMethod(property="compression", example="gzip", usage="uncompressed|snappy|zstd|gzip|lz4_raw", doc="<p>Configures the type of compression used for output.\nSupported values are probably\n<code>uncompressed</code>, <code>snappy</code>,\n<code>zstd</code>, <code>gzip</code> and <code>lz4_raw</code>.\nOthers may be available if the relevant codecs are on the\nclasspath at runtime.\nIf no value is specified, the parquet-mr library default\nis used, which is probably <code>uncompressed</code>.</p>") public void setCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codec)
uncompressed
, snappy
,
gzip
, lz4_raw
.codec
- compression typepublic org.apache.parquet.hadoop.metadata.CompressionCodecName getCompressionCodec()
@ConfigMethod(property="usedict", example="false", doc="<p>Determines whether dictionary encoding is used for output.\nThis will work well to compress the output\nfor columns with a small number of distinct values.\nEven when this setting is true,\ndictionary encoding is abandoned once many values\nhave been encountered (the dictionary gets too big).\nIf no value is specified, the parquet-mr library default\nis used, which is probably <code>true</code>.\n</p>") public void setDictionaryEncoding(Boolean useDict)
useDict
- true to use dictionary encoding,
false to use other methodspublic Boolean isDictionaryEncoding()
@ConfigMethod(property="votmeta", example="false", doc="<p>If true, rich metadata for the table will be written out\nin the form of a DATA-less VOTable that is stored in the\nparquet extra metadata key-value list under the key\n<code>IVOA.VOTable-Parquet.content</code>,\naccording to the\n<webref url=\'https://www.ivoa.net/documents/Notes/VOParquet/\'>VOParquet convention</webref> (version 1.0).\nThis enables items such as Units, UCDs and column descriptions, that would otherwise be lost in the serialization,\nto be stored in the output parquet file.\nThis information can then be recovered by parquet readers\nthat understand this convention.\n</p>") public void setVOTableMetadata(boolean votMeta)
votMeta
- true to store rich metadata as VOTable textpublic boolean isVOTableMetadata()
public void setVOTableVersion(VOTableVersion votVersion)
votVersion
- preferred VOTable version, or null for defaultpublic VOTableVersion getVOTableVersion()
Copyright © 2025 Central Laboratory of the Research Councils. All Rights Reserved.