public class ParquetTableWriter extends Object implements StarTableWriter, DocumentedIOHandler
Constructor and Description |
---|
ParquetTableWriter() |
Modifier and Type | Method and Description |
---|---|
boolean |
docIncludesExample()
Indicates whether the serialization of some (short) example table
should be added to the user documentation for this handler.
|
org.apache.parquet.hadoop.metadata.CompressionCodecName |
getCompressionCodec()
Returns the compression type used for data output.
|
String[] |
getExtensions()
Returns the list of filename extensions recognised by this handler.
|
String |
getFormatName()
Gives the name of the format which is written by this writer.
|
String |
getMimeType()
Returns a string suitable for use as the value of a MIME
Content-Type header.
|
String |
getXmlDescription()
Returns user-directed documentation in XML format.
|
Boolean |
isDictionaryEncoding()
Returns the dictionary encoding flag.
|
boolean |
isGroupArray()
Indicates how array-valued columns are written.
|
boolean |
looksLikeFile(String location)
Indicates whether the destination is of a familiar form for this
kind of writer.
|
void |
setCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codec)
Sets the compression type for data output.
|
void |
setDictionaryEncoding(Boolean useDict)
Sets the dictionary encoding flag.
|
void |
setGroupArray(boolean groupArray)
Configures how array-valued columns are written.
|
void |
writeStarTable(StarTable table,
OutputStream out)
Writes a
StarTable object to a given output stream. |
void |
writeStarTable(StarTable table,
String location,
StarTableOutput sto)
Writes a
StarTable object to a given location. |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
matchesExtension, readText, toLink
public String getFormatName()
StarTableWriter
getFormatName
in interface StarTableWriter
public String[] getExtensions()
DocumentedIOHandler
getExtensions
in interface DocumentedIOHandler
public boolean looksLikeFile(String location)
StarTableWriter
true
for values of location
which look like
the normal form for their output format, for instance one with
the usual file extension.looksLikeFile
in interface StarTableWriter
location
- the location name (probably filename)true
iff it looks like a file this writer would
normally writepublic String getMimeType()
StarTableWriter
application/octet-stream
"
(for binary formats) or "text/plain
" for ASCII ones)
is recommended.getMimeType
in interface StarTableWriter
public boolean docIncludesExample()
DocumentedIOHandler
Documented.getXmlDescription()
method already includes some example output, should return false.docIncludesExample
in interface DocumentedIOHandler
public String getXmlDescription()
Documented
The output should be a sequence of one or more <P> elements, using XHTML-like XML. Since rendering may be done in a number of contexts however, use of the full range of XHTML elements is discouraged. Where possible, the content should stick to simple markup such as the elements P, A, UL, OL, LI, DL, DT, DD EM, STRONG, I, B, CODE, TT, PRE.
getXmlDescription
in interface Documented
public void writeStarTable(StarTable table, String location, StarTableOutput sto) throws IOException
StarTableWriter
StarTable
object to a given location.
Implementations are free to interpret the location
argument
in any way appropriate for them. Typically however the location
will simply be used to get an output stream (for instance interpreting
it as a filename). In this case the sto
argument should
normally be used to turn location
into a stream.
StreamStarTableWriter
provides a suitable implementation
for this case.writeStarTable
in interface StarTableWriter
table
- table to writelocation
- destination for startab
sto
- StarTableOutput which dispatched this requestTableFormatException
- if startab
cannot be
written to location
IOException
- if there is some I/O errorpublic void writeStarTable(StarTable table, OutputStream out) throws IOException
StarTableWriter
StarTable
object to a given output stream.
The implementation can assume that out
is suitable for
direct writing (for instance it should not normally wrap it in a
BufferedOutputStream
), and should not close it
at the end of the call.
Not all table writers are capable of writing to a stream;
an implementation may throw a TableFormatException
to
indicate that it cannot do so.
writeStarTable
in interface StarTableWriter
table
- the table to writeout
- the output stream to which startab
should be
writtenTableFormatException
- if this table cannot be written to a
streamIOException
- if there is some I/O error@ConfigMethod(property="groupArray", usage="true|false", example="false", doc="<p>Controls the low-level detail of how array-valued columns\nare written.\nFor an array-valued int32 column named IVAL,\n<code>groupArray=false</code> will write it as\n\"<code>repeated int32 IVAL</code>\"\nwhile <code>groupArray=true</code> will write it as\n\"<code>optional group IVAL (LIST) { repeated group list\n{ optional int32 item} }</code>\".\nI don\'t know why you\'d want to do it the latter way,\nbut some other parquet writers seem to do that by default,\nso there must be some good reason.\n</p>") public void setGroupArray(boolean groupArray)
repeated
primitive,
if true, it's an optional group
containing a
repeated group
containing a optional
primitive. The latter way seems unnecessarily complicated to me,
but it seems to be what python writes.groupArray
- true for grouped arrays,
false for repeated primitivespublic boolean isGroupArray()
@ConfigMethod(property="compression", example="gzip", usage="uncompressed|snappy|gzip|lz4_raw", doc="<p>Configures the type of compression used for output.\nSupported values are probably\n<code>uncompressed</code>, <code>snappy</code>,\n<code>gzip</code> and <code>lz4_raw</code>.\nOthers may be available if the relevant codecs are on the\nclasspath at runtime.\nIf no value is specified, the parquet-mr library default\nis used, which is probably <code>uncompressed</code>.</p>") public void setCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codec)
uncompressed
, snappy
,
gzip
, lz4_raw
.codec
- compression typepublic org.apache.parquet.hadoop.metadata.CompressionCodecName getCompressionCodec()
@ConfigMethod(property="usedict", example="false", doc="<p>Determines whether dictionary encoding is used for output.\nThis will work well to compress the output\nfor columns with a small number of distinct values.\nEven when this setting is true,\ndictionary encoding is abandoned once many values\nhave been encountered (the dictionary gets too big).\nIf no value is specified, the parquet-mr library default\nis used, which is probably <code>true</code>.\n</p>") public void setDictionaryEncoding(Boolean useDict)
useDict
- true to use dictionary encoding,
false to use other methodspublic Boolean isDictionaryEncoding()
Copyright © 2024 Central Laboratory of the Research Councils. All Rights Reserved.