Paxata Backup Basics

Paxata Backup Basics

In this article, you will learn about the basics of Paxata backup tasks.

Overview

There are three components that requires backup in case of data loss from the running servers:
  1. ​Metadata Storage (MongoDB)
  2. Data Library Storage (HDFS)
  3. Properties Files (particularly pes.properties)
Notably, Pipeline cache files on executors do not need to backed up, as cache loss would be recovered by cache retrieval automatically.

Basic Tools

For each component, there are many tools for backup. Here we are recommending the most basic tools that can achieve the backup task alone. For better reliability/manageability, more advanced tools may be available.

Metadata Storage (MongoDB)

mongodump --out /tmp/mongobackup_`date +"%m-%d-%y"`

https://paxata.desk.com/customer/portal/articles/2773557-best-practice-for-mongo-backup-

Data Library Storage (HDFS)

Distcp allows you to copy directory from HDFS to another cluster/s3 bucket.

hadoop distcp hdfs://CDH5-nameservice/user/paxata/library s3a://bucket/librarybackup

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_admin_distcp_data_cluster_migrate...

Cloudera BDR is a Enterprise solution of Distcp
https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_bdr_howto_hdfs.html​

Properties Files (particularly pes.properties)

Upload Files from server local file system to S3 bucket

cd /usr/local/paxata/server/config
aws s3 sync . s3://bucket/propertybackup

0 Replies