cancel
Showing results for 
Search instead for 
Did you mean: 

Exporting parquet files with snappy compression

Exporting parquet files with snappy compression

I tried to export a 3 milion plus rows dataset as a parquet file to HDFS to feed a hive external table. It comes around 6 GB in size. The same file is 5.8 GB when exported as csv.

I would like to apply some compression when exporting it as parquet, because I believe paxata is doing some compression when storing its files in library in parquet format(otherwise it cannot store all these files if parquet occupies more space than csv. I have lot more files that are huge in rows and columns). can the same compression be applied to exports also? If applied, can a external hive table created on top of it will be able to read the data? also, any REST API command exists for the same? The version of paxata I use is Release: 2018.2.7.11.2697. 
Labels (1)
1 Reply
sayyar
Linear Actuator

Hi @shivkumr,

Paxata currently does not support compression during export. We will add it to the backlog of requested features. Thank you for the feedback!

Regards,
Shyam Ayyar
Product Manager