HDFS Encryption (Cloudera)

HDFS Encryption (Cloudera)

Paxata can import/export data to HDFS cluster both in encrypted/unencrypted zone. The majority of the configuration tasks are on Cloudera Manager side.

Here's overview of HDFS Encryption from Cloudera.

https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_sg_hdfs_encryption.html

Here's the detailed steps to enable HDFS Encryption in Cloudera Manager

https://www.cloudera.com/documentation/enterprise/5-6-x/topics/sg_hdfs_encryption_wizard.html

The last step of the above tutorial is Validate Data Encryption. If the validation is successful outside of Paxata UI, then we can proceed with Paxata side configuration:

1. Add Paxata Server to the HDFS Cluster in Cloudera Manager, assigning it the "HDFS Gateway" role.
2. Configure /usr/local/paxata/server/config/filesystem.properties px.library.storage.fs.rootDirectory to the HDFS encrypted zone directory.
3. If you have additional jars required for encryption, copy them to /usr/local/paxata/server/lib (for data library storage), as well as /usr/local/paxata/server/connectors/repo/paxata/connector-DIST-hdfs/VERS/lib/ (for hdfs connector), set the jar file ownership to paxata.
4. Copy all file under original data storage directory specified in px.library.storage.fs.rootDirectory in unecrypted zone to encrypted zone (the path you specified in step 2)
5. Restart Paxata server service. 
6. Validate data library storage by upload a local file in Paxata UI, click finish. If the dataset appears in Library page, then the write to HDFS encrypted zone is successful. Click the preview icon of the dataset, if you can preview the data, then the read from HDFS encrypted zone is successful.
7. Validate hdfs connector by create two hdfs connectors. One with root directory pointing to unencrypted zone, and another with root directory pointing to encrypted zone. Try import/export dataset to these two data sources.

If for some reason, HDFS encryption validation failed, backout plan would be to revert change in step2 above. So the data library storage points back to unencrypted zone.
0 Kudos
0 Replies