Regarding how to proceed; am I understanding correctly that you are now getting a green connection?
If you have successfully created a "Databricks (Spark)" data connection in the Data Connection UI here: https://app.datarobot.com/classic-account/data-connections
Then you should be able to successfully set up a Job Definition using that connection.
Just in case I misunderstood something I want to emphasize the correct connection to choose is as seen below marked with a green circle:
Here is how i got my "correct" jdbc url by the way:
I log in to Databricks, go to my compute cluster configuration, scroll to the buttom for advanced options, click odbc/jdbc and pick the version before 2.6.22
Hi @shoebkh , thank you for reporting the issue. Could you please share the deployment ID?
Hello @shoebkh , thank you again for your report. We've identified a potential issue and would like some additional information: are you using a Databricks connection for the output of your Job Definition? If that's the case, please note that Job Definitions does not currently support ouput to native Databricks connections (identified by a red icon in the application). Please configure and use another Databricks connection instead for use with Job Definitions.
Please let us know if the suggestion was helpful.
Hi @andrius-senulis Thank you for your response.
Yes so there is a lab called integrating Snowflake with DataRobot.
I'm trying to set up a job definition in DataRobot where the prediction source and destination are both Databricks instead of Snowflake(explained in the lab). When I try to save and run the job, I get an 'Internal server error' message. Can you help?
Even if I manage to establish a connection to Databricks within the job definition, it still won't function properly?
I'll attach some screen shot's for your reference.
Deployment ID: 664f579b5ff11c508a669fb5
Sorry @andrius-senulis could you explain this a bit more?
"If that's the case, please note that Job Definitions does not currently support output to native Databricks connections (identified by a red icon in the application). Please configure and use another Databricks connection instead for use with Job Definitions."
The heilighted red word Databricks did you mean datarobot?
Is there a way to hop on a call for support?
Hi @shoebkh I'll try to provide some help.
When configuring a new connection, you are probably using a dialog similar to this one
Unfortunately the Databricks adapter which I've crossed out in red is not currently compatible with Job Definitions.
Instead, please use another Databricks adapter like the one I've circled in Blue (or any other one named "Databricks" and with
the same icon).
Please let us know if you need any further assistance. Thank you.
I have Databricks (Spark) in the connection options, but I wasn't able to establish a connection to my Databricks account using this option.
To clarify, when I used the Databricks option (the red icon), I managed to connect to the database and see all the files. However, the connection would fail at the end, displaying an internal error when I pressed the "Run Predictions Now" button.
I'm also trying to use the prediction API within Databricks to make predictions. I'm stuck on the part where I need to use a file that exists in my Azure Databricks environment as the scoring data. I want DataRobot to write back the predictions into a new table in Databricks. Is this even possible?
Here is the code snippet I found in the lab. How should the type and file parameters change to achieve what I mentioned above, where my score file exists in the Azure Databricks environment and I want the predictions written back into Databricks with the unique ID?
"
job = dr.BatchPredictionJob.score(
deployment=deployment.id,
passthrough_columns=['wine_id'],
intake_settings={
'type': 'localFile',
'file': './winequality-white-score.csv'
},
output_settings={
'type': 'localFile',
'path': './winequality-white-predictions-231211.csv'
}
)
"
How can I modify this code to work with files in my Azure Databricks environment?
Hi @shoebkh
Thank you for your reply.
I would like to provide some context: DataRobot supports different modes of accessing databases. Most of these are based on the JDBC API, but some are not. Unfortunately, when it comes to connections to databases, batch predictions is currently only compatible with JDBC.
The Databricks connection type you see (with the red icon) is not based on JDBC and so, while you can use it to explore your database from the UI, it is not currently compatible with batch predictions. However, there should be a JDBC version of that connection type available. I'm currently investigating why your account does not show it.
As for your code snippet, you would have to modify it so that "type" would be "jdbc" and provide the appropriate parameters. You can find documentation on what those parameters are here:
https://docs.datarobot.com/en/docs/api/reference/batch-prediction-api/output-options.html#jdbc-write
And you can find a guide on configuring batch predictions output with JDBC via the API here:
https://docs.datarobot.com/en/docs/api/guide/python/jdbc-nb.html#configure-output-settings
Please note that your data store and credentials must be configured beforehand for this latter approach, and as you currently don't seem to have access to the JDBC-based Databricks connection driver you won't be able to achieve that. I'm investigating why that is and will reach out again as soon as possible.
Hi @shoebkh
I've previously worked on the Databricks integration when it was first introduced. I just did a quick test and was able to successfully connect and read + write to Databricks.
Does your JDBC URL start with `jdbc:databricks:...` or `jdbc:spark:...`? At this point we only support the old format (spark) which might be why you had connectivity issues.
Here is a screenshot of my Data connection
I connect using an access token as the credential.
Here I setup the intake adapter within the job definition UI:
Here I setup the output adapter:
All rows were successfully scored:
Hope this might help.