cancel
Showing results for 
Search instead for 
Did you mean: 

Error when creating a Datetime Partitioning Spec in REST API

cpg
Image Sensor

Error when creating a Datetime Partitioning Spec in REST API

Hello all,

 

I have been able to successfully run a time series model through the python API with relative ease:

 

partitioning_spec = dr.DatetimePartitioningSpecification(
'DT', use_time_series=True, multiseries_id_columns=['Source'])
project.set_target('Value', partitioning_method=partitioning_spec)
 
---
 
I have not, however, had the same success using the same data and the same techniques (at least as far as I can tell) using the REST API:
 
{"useTimeSeries", true,
"datetimePartitionColumn", "DT",
"multiseriesIdColumns", ["Source"]}
 
I encounter the following error, which I can't find discussed online and cannot wrap my head around what it could be trying to tell me:
 

error{"message": "Column DT has not been analyzed for time series modeling with the specified multiseries id columns ['Source']."}

 

---

 

For reference, my test data looks like this:

DT                     Source   Value

01-01-2000      f1            1

01-01-2000      f2            100

01-02-2000      f1            2

01-02-2000      f2            105

...

 
Labels (1)
0 Kudos
10 Replies

You should add a step in order to wait for 200 code. Because time partitioning could take some time, if you send the command before it is completed it produces error. Check the code below.  Each STP must be applied separate steps in REST API.

STP

multiseries(POST)

BODY-Json

{
"datetimePartitionColumn": "PERIOD",
"multiseriesIdColumns": ["SERIAL_ID"]
}

STP
getresponse(GET)
BODY
{
"datetimePartitionColumn": "PERIOD",
"multiseriesIdColumns": ["SERIAL_ID"]
}
TEST

{
var project_id = pm.variables.get("projectId");
var res = JSON.parse(responseBody);
if (pm.response.code === 200){


detectedMultiseriesIdColumns = res.detectedMultiseriesIdColumns.length;


if (detectedMultiseriesIdColumns =="1" ) {
postman.setNextRequest("run");
console.log("OK");
}
else{
console.log(detectedMultiseriesIdColumns.length);
setTimeout(function(){},[20000]);
postman.setNextRequest("multiseries");
}

}
else{
setTimeout(function(){},[20000]);
postman.setNextRequest("getresponse");
console.log("getrestrepeat");
}


STP

run (PATCH)

BODY

{
"target": "TARGET",
"mode": "quick",
"featureDerivationWindowStart": -6,
"featureDerivationWindowEnd": 0,
"forecastWindowStart": 1,
"forecastWindowEnd": 3,
"numberOfBacktests": 2,
"useTimeSeries": true,
"datetimePartitionColumn": "PERIOD",
"multiseriesIdColumns": [
"SERIAL_ID"
],
"cvMethod": "datetime",
"blendBestModels": false,
"windowsBasisUnit": "MONTH"
}

0 Kudos

Hi, we also had the same error. How could we train timeseries model via API. Although we set the "useTimeSeries":true it behaves as if regression.

0 Kudos

In addition, I have printed out the response from the multiseriesProperties post and it seems to be a success, despite still leading to the original error when calling the datetimePartitioning post.

 

HTTP/1.1 202 ACCEPTED [Date: Wed, 13 Jul 2022 04:38:31 GMT, Content-Type: text/html; charset=utf-8, Content-Length: 0, Connection: keep-alive, Server: openresty, Location: https://app.datarobot.com/api/v2/status/2e9fc7bf-0c3a-44e7-bd91-1cf37e0ec90b/, Pragma: no-cache, Cache-Control: no-store, x-request-id: e01b988aa7db813d3b284ee3f52c9ecf, Strict-Transport-Security: max-age=16070400; includeSubDomains, X-Frame-Options: SAMEORIGIN, Referrer-Policy: origin-when-cross-origin, X-Content-Type-Options: nosniff, X-XSS-Protection: 1; mode=block, X-DataRobot-Request-ID: e01b988aa7db813d3b284ee3f52c9ecf, Expect-CT: max-age=86400, enforce]

0 Kudos

While I do greatly appreciate the effort put forth, I do not understand how that example relates to my specific issue.  I am able to make DataRobot REST API calls in general, and am working on partitioning before training. The example does not seem to mention partitioning or training or multiseries properties analysis. It is entirely possible I am missing something here, and I apologize if that is the case. I do appreciate your patience with me thus far.

 

Here is an example that I have run which returns the error: 

error{"message": "Column DT has not been analyzed for time series modeling with the specified multiseries id columns ['Source']."}. 

 

// begin partitioning ------------------------------------------------------------------------------
HttpPost partition = new HttpPost(this.url + "/projects/" + projectID + "/datetimePartitioning/");
HttpPost props = new HttpPost(this.url + "/projects/" + projectID + "/multiseriesProperties/");
JSONObject parametersTS = new JSONObject();
JSONObject propsParams = new JSONObject();

 

parametersTS.put("useTimeSeries", true);
parametersTS.put("datetimePartitionColumn", DateCol);
parametersTS.put("multiseriesIdColumns", new String[]{SourceCol});
propsParams.put("datetimePartitionColumn", DateCol);
propsParams.put("multiseriesIdColumns", new String[]{SourceCol});
// parametersTS.put("forecastWindowEnd", 1);
// parametersTS.put("forecastWindowStart", 1);
// parametersTS.put("featureDerivationWindowEnd", 0);
// parametersTS.put("featureDerivationWindowStart", -10);

props.setEntity(new StringEntity(propsParams.toString(), ContentType.APPLICATION_JSON));
props.addHeader("Authorization", "Token " + this.apiToken);
CloseableHttpResponse modelResponse = httpClient.execute(props);
if (modelResponse.getStatusLine().getStatusCode() >= 300) {
String errorString = EntityUtils.toString(modelResponse.getEntity());
props.releaseConnection();
error.add(errorString);
return error;
}
props.releaseConnection();

partition.setEntity(new StringEntity(parametersTS.toString(), ContentType.APPLICATION_JSON));
partition.addHeader("Authorization", "Token " + this.apiToken);
CloseableHttpResponse modelResponse2 = httpClient.execute(partition);
if (modelResponse2.getStatusLine().getStatusCode() >= 300) {
String errorString = EntityUtils.toString(modelResponse2.getEntity());
partition.releaseConnection();
error.add(errorString);
return error;
}
partition.releaseConnection();
 
---
This is all run after the data has already been imported, but maybe I need to pass a data file to one of these calls somehow for proper analysis to take place? I have also tried experimenting with when the connections are released, but this has not improved my outcome.
0 Kudos
dalilaB
DataRobot Alumni

Here is an example

 

 

 

#Get the deployment
def find_or_return_deployment(ts_setting):
    deployments = dr.Deployment.list()
    found_dep = [x for x in deployments if re.search(ts_setting["deploy_name"],x.label)]
    if found_dep:
        return found_dep[0]
    else:
        deployment = dr.Deployment.create_from_learning_model(
                        model.id, label=ts_setting["deploy_name"], description=ts_setting["deploy_desc"],
                        default_prediction_server_id=prediction_server.id)
        return deployment
#Code for scoring
class DataRobotPredictionError(Exception):
    """Raised if there are issues getting predictions from DataRobot"""


def make_datarobot_deployment_predictions(
        data,
        deployment_id,
        forecast_point=None,
        predictions_start_date=None,
        predictions_end_date=None,
):
    """
    Make predictions on data provided using DataRobot deployment_id provided.
    See docs for details:
         https://app.datarobot.com/docs/predictions/api/dr-predapi.html

    Parameters
    ----------
    data : str
        Feature1,Feature2
        numeric_value,string
    deployment_id : str
        Deployment ID to make predictions with.
    forecast_point : str, optional
        Forecast point as timestamp in ISO format
    predictions_start_date : str, optional
        Start of predictions as timestamp in ISO format
    predictions_end_date : str, optional
        End of predictions as timestamp in ISO format

    Returns
    -------
    Response schema:
        https://app.datarobot.com/docs/predictions/api/dr-predapi.html#response-schema

    Raises
    ------
    DataRobotPredictionError if there are issues getting predictions from DataRobot
    """
    # Set HTTP headers. The charset should match the contents of the file.
    headers = {
        'Content-Type': 'text/plain; charset=UTF-8',
        'Authorization': 'Bearer {}'.format(API_KEY),
        'DataRobot-Key': DATAROBOT_KEY,
    }

    url = API_URL.format(deployment_id=deployment_id)

    # Prediction Explanations:
    # See the documentation for more information:
    # https://app.datarobot.com/docs/predictions/api/dr-predapi.html#request-pred-explanations
    # Should you wish to include Prediction Explanations or Prediction Warnings in the result,
    # Change the parameters below accordingly, and remove the comment from the params field below:

    params = {
        'forecastPoint': forecast_point,
        'predictionsStartDate': predictions_start_date,
        'predictionsEndDate': predictions_end_date,
        # If explanations are required, uncomment the line below
        'maxExplanations': 3,
        # 'thresholdHigh': 0.5,
        # 'thresholdLow': 0.15,
        # Uncomment this for Prediction Warnings, if enabled for your deployment.
        # 'predictionWarningEnabled': 'true',
    }

    # Make API request for predictions
    predictions_response = requests.post(url, data=data, headers=headers, params=params)
    _raise_dataroboterror_for_status(predictions_response)
    # Return a Python dict following the schema in the documentation
    return predictions_response.json()


def _raise_dataroboterror_for_status(response):
    """Raise DataRobotPredictionError if the request fails along with the response returned"""
    try:
        response.raise_for_status()
    except requests.exceptions.HTTPError:
        err_msg = '{code} Error: {msg}'.format(
            code=response.status_code, msg=response.text)
        raise DataRobotPredictionError(err_msg)
#Score
#Notice that I saved the dataset to a csv file, so I can read is as binary
filename = "score_data.csv"
data = open(filename, 'rb').read()
data_size = sys.getsizeof(data)
try:
    predictions = make_datarobot_deployment_predictions(
    data,
    ts_setting["deployment_id"],
    forecast_point= ts_setting["ForcastPoint"]
    )
except DataRobotPredictionError as exc:
    print(exc)
#read it as json
result_js = json.dumps(predictions)
#read it to a pandas DataFrame
#get the latest weather date
tp1 = pd.json_normalize(predictions['data'],record_path=['predictionValues'])
tp2_exp = pd.json_normalize(predictions['data'])

 

 

 

Thank you!

 

I've messed around with this a little now and just want to be clear about the correct process here.

 

I need to post api/v2/projects/{projectId}/multiseriesProperties/

with the body { "datetimePartitionColumn": "DT", "multiseriesIdColumns": [ "Source" ] }

and then  post api/v2/projects/{projectId}/datetimePartitioning/

with the body mentioned in my original post?

 

This currently still gives me the same error message, so maybe I am still tripping on something here. Maybe I need to link these posts somehow?

 

Thank you for getting me this far.

0 Kudos
Bogdan Tsal-Tsalko
Data Scientist
Data Scientist

You'll need to invoke this multiseriesProperties route to trigger the computation first.
The python client automagically kicks off multiseries analysis, as does the browser app, but if you want to use the REST API it's a few more steps to take.

Source is suggested as a viable option if I go through this whole process in the web interface.

 

Screen Shot 2022-07-07 at 3.16.22 PM.png

 

The data tab after this failure does not seem extremely interesting.

Screen Shot 2022-07-07 at 3.14.24 PM.png

 

Thank you for having a look.

0 Kudos

I messed up my example data and will edit it immediately. The source should not change each day.

0 Kudos
Bogdan Tsal-Tsalko
Data Scientist
Data Scientist

Hi!

May you show a screenshot from the DataRobot Data tab when it's uploaded to the platform but had not started yet? I would like to see feature types to check for correct recognition by platform. Also, try to set up multi-series in the platform and check if DataRobot suggests "Source" as a viable option for a multi-series id column. Because in your example dataset looks like Source changes each day instead of for each date available multiple Sources.

0 Kudos