cancel
Showing results for 
Search instead for 
Did you mean: 


Data Anonymization

BJ
Linear Actuator

Hi,

What capabilities are provided by Paxata to Anonymize customer data?

The customer data can be provided by on premise customer in flat files.

 

Regards,

Bhanu

 

Labels (1)
5 Replies
akshay
DataRobot Employee
DataRobot Employee

DataRobot DataPrep has multiple guardrail options which can help admins make sure only certain people/groups have access to the data.

 

We can also build DataPrep projects and implement Find + Replace to mask certain rows and columns of data. Please do let me know if this helps or if you need any additional information regarding the same.

 

Thanks,

Akshay

BJ
Linear Actuator

Can you add more to find and replace capability for anonymization. 

Is it limited to replacing the values with null values or some junk or dummy patters $$$$  or ######.

Does the attribute maintains it's intrinsic behavior while anonymizing the column?

Do you also have anonymization approaches to maintain referential integrity?

akshay
DataRobot Employee
DataRobot Employee

Could you give me an example for the kind of anonymization that you're trying to do?

 

0 Kudos
BJ
Linear Actuator

Hi, These are the Data Anonymization techniques. What can we do in Paxata to anonymize data?

§Data masking – Substitute the characters with Null values, * or #

§Pseudonymization – Replacing with similar false identity, replacing dummy name

§Generalization – Remove specific identifiers, retain only Year of Date of Birth

§Synthetic Data Generation – System generated synthetic values by applying statistical functions (mean, median)

 

Can we build custom scripts (e.g. in SQL, Python)  in Paxata to perform anonymization on datasets.

 

akshay
DataRobot Employee
DataRobot Employee

All of these things can be done on a DataPrep project - once this is done you can give access to the project output dataset to the users for their dataprep exercise.

 

1) Data Masking - use Find and replace on Paxata - mask the characters that you are looking to and replace them with Null,* or 

2) Pseudonymization - you can do a look up that will map all the names in your data with dummy names.

3) Generalization - can be done using computed columns for the date of birth example that you just mentioned (Different Generalizations will have different approaches)

4) Synthetic data generation - you can use the mean/median compute function to populate a column with just mean/median values and just replace the values that you are looking to replace using find+replace.

 

Note: All of this have to be manually done once by an admin user before they can be automated.

and the output of these projects would contain the anonymized data.∫

 

As of today, we do not support python or SQL transformations on Paxata. However, we our working on getting Spark SQL data transformation in our product in the near future.