The Shape feature is very powerful. It provides an opportunity to do multiple tasks within a single step. At a minimum it enables you to:
- Eliminate fields you no longer wish to retain in your dataset
- Merge duplicate records into a single record based on the fields you have chosen to remain in your shape
Furthermore - depending on the shape option you provide you also get the opportunity to:
- Transform the layout of your data
- Generate new fields
- Derive values including distinct counts, sums, and averages (for more details please refer to the online documentation).
The cost of this operation is that the data that is removed will not be carried forward or accessible once the shape occurs. One thing you can consider is the use of a
lens to preserve the base detail, publish the content to create a detail answerset, and use that answerset as the basis of a new project.
What your project would look like is:
- Load your file
- Do all the work before the shaping requirements
- Create a lens on the step before the shaping
- Publish the data
- Then do to the step with the shape
- Create a lens based on that output
- Publish the second lens
- Create a new project
- Load the original detailed answerset
- Do your 3rd shape
- Do a lookup/join on the output of the 2nd answerset to combine the results
One other thing to consider: Window Functions. It won't help in this case, but if you are using a Shape -> Group by and doing things like Sum, Average, Count, First, Last, Min, Max, or Medium - you can do a Window Function without having to resort to a shape.
The reason for this is that when you perform a calculation in Paxata, the default behavior is the perform the calculation on the row of data you are working on. The computed columns work within the row. Filters can be used to control which rows the computed column will be calculated.
To compare the values across multiple rows - you either need to reshape the data using a Shape operation or you can define a window of values using the Window Functions Feature. It was introduced in 2019.2 and extended upon in the 2020.1 release. The notion is that you specify a field you wish to perform the aggregate, and you define a field that will serve as the window or grouping of records the aggregation should consider. You can compute # companies in a territory or # companies that purchased a product. Count Distinct is not in the Window functions as of yet.
To get complete details on the Windows Function feature - click
here.