Enable column lineage for transparency & troubleshooting
Why read this tip?
Understanding data lineage is critical. It helps secure trust in the data and the business decisions that result therefrom. Oftentimes your data preparation work will result in the creation of dozens of steps and business rules / logic with some assumptions along the way (which can be annotated in the Versions panel—but that's a tip for another day!). The ability to trace back to all the steps which impacted a particular key attribute or feature is quite important as it provides transparency into how the data was sourced and compiled and an easy way to troubleshoot mistakes in derived columns.
Here we go
This tip demonstrates the built-in data lineage features of Data Prep. In this scenario, we highlight the usefulness and simplicity of "Show Lineage Mode."
Within a Data Prep project, hover over any column header to reveal the lineage icon:
Hovering directly over the icon will expand the label and allow you to click Show Lineage Mode for that column: You will now see the Steps Panel appear (if it was hidden). Any step that impacts the values in that column are highlighted with a blue border (e.g., like the "Change" step in the following image). (Note: For viewing convenience the steps before, between, and after are collapsed.)
To expand/view all steps in context of the highlighted lineage, click Show All Steps:
To exit lineage mode, click (X) or ClearLineage (Note: Entering "Edit" mode will also get you out of lineage mode.
You can quickly cycle through the complete column lineage by clicking Show Lineage Mode for each column (as shown in step 2).
I hope this tip is helpful to you. I’m going to be creating more 'Tip of the Day' blog posts for Data Prep. If there’s a tip you’d like me to cover, please let me know - @akshay