A question that has come up a few times over the last few months, is How can I calculate a running/cumulative total in Dataprep?
Let's look at two methods to achieve this:
The below dataset will be used in both examples:
Importing it into DataRobot Dataprep looks like this:
Now let's get started....
Call the Column Dummy
Enter the value 1 (the value is of no consequence). The screen will look like this:
then
To show a cumulative total populate the window with:
Alternatively one can use the Spark SQL transformation by clicking the icon
(Note for convenience I renamed the columns to have no spaces or special characters).
The syntax I have used to use the Spark SQL cumsum function is:
SELECT SaleDate, SalesReturns, SUM(SalesReturns)
over (ORDER BY SaleDate ROWS BETWEEN unbounded preceding AND CURRENT ROW ) cumsum
FROM dataset
ORDER BY SaleDate
Screenshot below:
And there you have it, two ways to calculate a running/cumulative total in DataRobot Dataprep!