I tried to export a 3 milion plus rows dataset as a parquet file to HDFS to feed a hive external table. It comes around 6 GB in size. The same file is 5.8 GB when exported as csv.
I would like to apply some compression when exporting it as parquet, because I believe paxata is doing some compression when storing its files in library in parquet format(otherwise it cannot store all these files if parquet occupies more space than csv. I have lot more files that are huge in rows and columns). can the same compression be applied to exports also? If applied, can a external hive table created on top of it will be able to read the data? also, any REST API command exists for the same? The version of paxata I use is Release: 2018.2.7.11.2697.