ParquetTableWriter (Starlink-UK API)

java.lang.Object
- uk.ac.starlink.parquet.ParquetTableWriter

All Implemented Interfaces:

Documented, DocumentedIOHandler, StarTableWriter
```
public class ParquetTableWriter
extends Object
implements StarTableWriter, DocumentedIOHandler
```
TableWriter implementation for output to Parquet format. As well as writing a basic Parquet file, metadata following the VOParquet convention (currently at v1.0) is optionally written.

Since:

25 Feb 2021

Author:

Mark Taylor

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class ParquetTableWriter.KVMap
Map of key-value pairs.

Nested Classes
Modifier and Type	Class and Description
`static class`	`ParquetTableWriter.KVMap` Map of key-value pairs.

Constructor Summary

Constructors
Constructor and Description

ParquetTableWriter()

Constructors
Constructor and Description
`ParquetTableWriter()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`docIncludesExample()` Indicates whether the serialization of some (short) example table should be added to the user documentation for this handler.
`org.apache.parquet.hadoop.metadata.CompressionCodecName`	`getCompressionCodec()` Returns the compression type used for data output.
`String[]`	`getExtensions()` Returns the list of filename extensions recognised by this handler.
`String`	`getFormatName()` Gives the name of the format which is written by this writer.
`Map<String,String>`	`getKeyValueItems()` Returns a map of additional items for the key-value map in the parquet file footer.
`String`	`getMimeType()` Returns a string suitable for use as the value of a MIME Content-Type header.
`VOTableVersion`	`getVOTableVersion()` Returns the version of VOTable used to write metadata, if any.
`String`	`getXmlDescription()` Returns user-directed documentation in XML format.
`Boolean`	`isDictionaryEncoding()` Returns the dictionary encoding flag.
`boolean`	`isGroupArray()` Indicates how array-valued columns are written.
`boolean`	`isVOTableMetadata()` Returns the flag that indicates storing metadata in a dummy VOTable.
`boolean`	`looksLikeFile(String location)` Indicates whether the destination is of a familiar form for this kind of writer.
`void`	`setCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codec)` Sets the compression type for data output.
`void`	`setDictionaryEncoding(Boolean useDict)` Sets the dictionary encoding flag.
`void`	`setGroupArray(boolean groupArray)` Configures how array-valued columns are written.
`void`	`setKeyValueItems(Map<String,String> kvItems)` Sets additional items for the key-value map in the parquet file footer.
`void`	`setKVMap(ParquetTableWriter.KVMap kvItems)` Calls setKeyValueItems.
`void`	`setVOTableMetadata(boolean votMeta)` Sets the flag to indicate storing metadata in a dummy VOTable.
`void`	`setVOTableVersion(VOTableVersion votVersion)` Sets the version of VOTable used to write metadata, if any.
`void`	`writeStarTable(StarTable table, OutputStream out)` Writes a `StarTable` object to a given output stream.
`void`	`writeStarTable(StarTable table, String location, StarTableOutput sto)` Writes a `StarTable` object to a given location.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface uk.ac.starlink.table.formats.DocumentedIOHandler
matchesExtension, readText, toLink

- Constructor Detail
 - ParquetTableWriter
```
public ParquetTableWriter()
```
- Method Detail
 - getFormatName
```
public String getFormatName()
```
 Description copied from interface: StarTableWriter
 
 Gives the name of the format which is written by this writer. Matching against this string may be used by callers to identify or select this writer from a list.
 
 Specified by:
 
 getFormatName in interface StarTableWriter
 
 Returns:
 
 a short string identifying the output format of this writer
 - getExtensions
```
public String[] getExtensions()
```
 Description copied from interface: DocumentedIOHandler
 
 Returns the list of filename extensions recognised by this handler.
 
 Specified by:
 
 getExtensions in interface DocumentedIOHandler
 
 Returns:
 
 lower-cased filename extension strings, no "." characters
 - looksLikeFile
```
public boolean looksLikeFile(String location)
```
 Description copied from interface: StarTableWriter
 
 Indicates whether the destination is of a familiar form for this kind of writer. This may be used to guess what kind of format a table should be written in. Implementations should return true for values of location which look like the normal form for their output format, for instance one with the usual file extension.
 
 Specified by:
 
 looksLikeFile in interface StarTableWriter
 
 Parameters:
 
 location - the location name (probably filename)
 
 Returns:
 
 true iff it looks like a file this writer would normally write
 - getMimeType
```
public String getMimeType()
```
 Description copied from interface: StarTableWriter
 
 Returns a string suitable for use as the value of a MIME Content-Type header. If no suitable MIME type is available or known, one of "application/octet-stream" (for binary formats) or "text/plain" for ASCII ones) is recommended.
 
 Specified by:
 
 getMimeType in interface StarTableWriter
 
 Returns:
 
 MIME content type
 - docIncludesExample
```
public boolean docIncludesExample()
```
 Description copied from interface: DocumentedIOHandler
 
 Indicates whether the serialization of some (short) example table should be added to the user documentation for this handler. Binary formats, or instances for which the Documented.getXmlDescription() method already includes some example output, should return false.
 
 Specified by:
 
 docIncludesExample in interface DocumentedIOHandler
 
 Returns:
 
 true if the user documentation would benefit from the addition of an example serialization
 - getXmlDescription
```
public String getXmlDescription()
```
 Description copied from interface: Documented
 
 Returns user-directed documentation in XML format.
 The output should be a sequence of one or more elements, using XHTML-like XML. Since rendering may be done in a number of contexts however, use of the full range of XHTML elements is discouraged. Where possible, the content should stick to simple markup such as the elements P, A, UL, OL, LI, DL, DT, DD EM, STRONG, I, B, CODE, TT, PRE.
 
 Specified by:
 
 getXmlDescription in interface Documented
 
 Returns:
 
 XML description of this object
 - writeStarTable
```
public void writeStarTable(StarTable table,
 String location,
 StarTableOutput sto)
 throws IOException
```
 Description copied from interface: StarTableWriter
 
 Writes a StarTable object to a given location. Implementations are free to interpret the location argument in any way appropriate for them. Typically however the location will simply be used to get an output stream (for instance interpreting it as a filename). In this case the sto argument should normally be used to turn location into a stream. StreamStarTableWriter provides a suitable implementation for this case.
 
 Specified by:
 
 writeStarTable in interface StarTableWriter
 
 Parameters:
 
 table - table to write
 
 location - destination for startab
 
 sto - StarTableOutput which dispatched this request
 
 Throws:
 
 TableFormatException - if startab cannot be written to location
 
 IOException - if there is some I/O error
 - writeStarTable
```
public void writeStarTable(StarTable table,
 OutputStream out)
 throws IOException
```
 Description copied from interface: StarTableWriter
 
 Writes a StarTable object to a given output stream. The implementation can assume that out is suitable for direct writing (for instance it should not normally wrap it in a BufferedOutputStream), and should not close it at the end of the call.
 Not all table writers are capable of writing to a stream; an implementation may throw a TableFormatException to indicate that it cannot do so.
 
 Specified by:
 
 writeStarTable in interface StarTableWriter
 
 Parameters:
 
 table - the table to write
 
 out - the output stream to which startab should be written
 
 Throws:
 
 TableFormatException - if this table cannot be written to a stream
 
 IOException - if there is some I/O error
 - setGroupArray
```
@ConfigMethod(property="groupArray",
 sequence=5,
 usage="true|false",
 example="false",
 doc="Controls the low-level detail of how array-valued columns\nare written.\nFor an array-valued int32 column named IVAL,\n<code>groupArray=false</code> will write it as\n\"<code>repeated int32 IVAL</code>\"\nwhile <code>groupArray=true</code> will write it as\n\"<code>optional group IVAL (LIST) {repeated group list\n{optional int32 element}}</code>\".\nAlthough setting it <code>false</code> may be slightly more\nefficient, the default is <code>true</code>,\nsince if any of the columns have array values that either\nmay be null or may have elements which are null,\ngroupArray-style declarations for all columns are required\nby the <webref url=\'https://github.com/apache/parquet-format/blob/apache-parquet-format-2.10.0/LogicalTypes.md\'>Parquet file format</webref>:\n<blockquote>\n\"A repeated field that is neither contained by a LIST- or\nMAP-annotated group nor annotated by LIST or MAP should be\ninterpreted as a required list of required elements where\nthe element type is the type of the field.\nImplementations should use either LIST and MAP annotations\nor unannotated repeated fields, but not both. When using the\nannotations, no unannotated repeated types are allowed.\"\n</blockquote>\nIf this option is set false and an attempt is made to write\nnull arrays or arrays with null values, writing will fail.\n")
public void setGroupArray(boolean groupArray)
```
 Configures how array-valued columns are written. If false, it's a top-level repeated primitive, if true, it's an optional group containing a repeated group containing a optional primitive.
 True is the default, set it false with care, since that precludes null array values or array elements.
 
 Parameters:
 
 groupArray - true for grouped arrays, false for repeated primitives
 - isGroupArray
```
public boolean isGroupArray()
```
 Indicates how array-valued columns are written.
 
 Returns:
 
 true for grouped arrays, false for repeated primitives
 - setCompressionCodec
```
@ConfigMethod(property="compression",
 sequence=2,
 example="gzip",
 usage="uncompressed|snappy|zstd|gzip|lz4_raw",
 doc="Configures the type of compression used for output.\nSupported values are probably\n<code>uncompressed</code>, <code>snappy</code>,\n<code>zstd</code>, <code>gzip</code> and <code>lz4_raw</code>.\nOthers may be available if the relevant codecs are on the\nclasspath at runtime.\nIf no value is specified, the parquet-mr library default\nis used, which is probably <code>uncompressed</code>.")
public void setCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codec)
```
 Sets the compression type for data output. Supported options are currently uncompressed, snappy, gzip, lz4_raw.
 
 Parameters:
 
 codec - compression type
 - getCompressionCodec
```
public org.apache.parquet.hadoop.metadata.CompressionCodecName getCompressionCodec()
```
 Returns the compression type used for data output.
 
 Returns:
 
 compression type
 - setDictionaryEncoding
```
@ConfigMethod(property="usedict",
 example="false",
 sequence=4,
 doc="Determines whether dictionary encoding is used for output.\nThis will work well to compress the output\nfor columns with a small number of distinct values.\nEven when this setting is true,\ndictionary encoding is abandoned once many values\nhave been encountered (the dictionary gets too big).\nIf no value is specified, the parquet-mr library default\nis used, which is probably <code>true</code>.\n")
public void setDictionaryEncoding(Boolean useDict)
```
 Sets the dictionary encoding flag. If null, the library default is used.
 
 Parameters:
 
 useDict - true to use dictionary encoding, false to use other methods
 - isDictionaryEncoding
```
public Boolean isDictionaryEncoding()
```
 Returns the dictionary encoding flag.
 
 Returns:
 
 true to use dictionary encoding, false for other methods
 - setVOTableMetadata
```
@ConfigMethod(property="votmeta",
 sequence=1,
 example="false",
 doc="If true, rich metadata for the table will be written out\nin the form of a DATA-less VOTable that is stored in the\nparquet extra metadata key-value list under the key\n<code>IVOA.VOTable-Parquet.content</code>,\naccording to the\n<webref url=\'https://www.ivoa.net/documents/Notes/VOParquet/\'>VOParquet convention</webref> (version 1.0).\nThis enables items such as Units, UCDs and column descriptions, that would otherwise be lost in the serialization,\nto be stored in the output parquet file.\nThis information can then be recovered by parquet readers\nthat understand this convention.\n")
public void setVOTableMetadata(boolean votMeta)
```
 Sets the flag to indicate storing metadata in a dummy VOTable.
 
 Parameters:
 
 votMeta - true to store rich metadata as VOTable text
 - isVOTableMetadata
```
public boolean isVOTableMetadata()
```
 Returns the flag that indicates storing metadata in a dummy VOTable. See the VOParquet convention.
 
 Returns:
 
 if true, rich metadata will be stored as VOTable text
 - setKeyValueItems
```
public void setKeyValueItems(Map<String,String> kvItems)
```
 Sets additional items for the key-value map in the parquet file footer. Any items here will override whatever was going to be written otherwise. Entries with a null value will delete the corresponding key.
 
 Parameters:
 
 kvItems - map of key-value pairs to be written to parquet footer
 - getKeyValueItems
```
public Map<String,String> getKeyValueItems()
```
 Returns a map of additional items for the key-value map in the parquet file footer.
 
 Returns:
 
 additional key-value metadata items
 - setKVMap
```
@ConfigMethod(property="kvmap",
 sequence=3,
 example="author:Messier",
 usage="key1:value1;key2:value2;...",
 doc="Can be used to doctor the map of key-value metadata\nstored in the parquet footer.\nMap items are specified with a colon, like\n<code>&lt;key&gt;:&lt;value&gt;</code>\nand separated with a semicolon,\nso for instance you could write\n\"<code>kvmap=author:Messier;year:1774</code>\".\nThis will overwrite any map entries that would otherwise\nhave been written.\nIf a value starts with the at sign (\"<code>@</code>\")\nit is interpreted as giving the name of a file\nwhose contents will be used instead of the literal value.\nSpecifying an empty entry will ensure it is not written\ninto the key=value list.The following output format specification would write\nparquet output including VOParquet metadata from\na manually prepared VOTable file <code>meta.vot</code>:\n<verbatim>\n parquet(votmeta=false,kvmap=IVOA.VOTable-Parquet.version:1.0;IVOA.VOTable-Parquet.content:@meta.vot)</verbatim>\n")
public void setKVMap(ParquetTableWriter.KVMap kvItems)
```
 Calls setKeyValueItems. But the KVMap argument means that this can be configured from a user-supplied string value.
 
 Parameters:
 
 kvItems - additional key-value metadata items
 - setVOTableVersion
```
public void setVOTableVersion(VOTableVersion votVersion)
```
 Sets the version of VOTable used to write metadata, if any.
 
 Parameters:
 
 votVersion - preferred VOTable version, or null for default
 - getVOTableVersion
```
public VOTableVersion getVOTableVersion()
```
 Returns the version of VOTable used to write metadata, if any.
 
 Returns:
 
 preferred VOTable version, or null for default

Class ParquetTableWriter

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface uk.ac.starlink.table.formats.DocumentedIOHandler

Constructor Detail

ParquetTableWriter

Method Detail

getFormatName

getExtensions

looksLikeFile

getMimeType

docIncludesExample

getXmlDescription

writeStarTable

writeStarTable

setGroupArray

isGroupArray

setCompressionCodec

getCompressionCodec

setDictionaryEncoding

isDictionaryEncoding

setVOTableMetadata

isVOTableMetadata

setKeyValueItems

getKeyValueItems

setKVMap

setVOTableVersion

getVOTableVersion