cancel
Showing results for 
Search instead for 
Did you mean: 

Problem getting BigQuery as data source working in DataRobot

Problem getting BigQuery as data source working in DataRobot

Hi.

I'm currently experimenting with BigQuery as Data Source. I do that by clicking on the person icon -> "Data Connections" -> "Add new data connection" and then fill out the OAuth-credentials (OAuthType stays at 2 which seems to be the default value). But when clicking on "Test connection" then all I get is an empty popup-window with title "Test Data Connection" and no confirmation on whether I've filled in the fields correctly.

I've also tried creating a new project with the newly created data source but no BigQuery datasets are shown. The only time where I do get a response from DataRobot is when I'm in "Credentials Management" -> "Add new (Credential)" and then associate the newly created data connection to the credential. Once done a plug/socket-icon appears in the list, and when I click on the icon then it changes to a green light, suggesting that testing connection works. What am I missing in order to see my data in BigQuery when creating a project?

Also, are you using Simba driver behind the scene? I'm asking because from their documentation it seems like one can provide a service account key-file. With the OAuth credentials it seems like we'd have to create an actual user in Google Cloud since OAuthRefreshToken isn't available for service accounts.

I'm using Chrome without any adblocker. Chrome network debugger is blank when clicking on the "Test connection"-button.

/Jack

1 Solution

Accepted Solutions
JoshKF
DataRobot Alumni

Hi Jack,

I think I can help with these questions:

  • For the first issue, we've done a little digging and it looks like a feature flag issue. We're working on having this updated for you to fully enable the BigQuery functionality.
  • re: your second question - Although we don’t use Simba behind the scenes, we do use another driver (from Google) which is almost identical. We don’t currently support using a key-file for authentication.

Please let me know if this is helpful and if you try again, if you're successful in connecting to BigQuery.

Thanks!

View solution in original post

3 Replies
JoshKF
DataRobot Alumni

Hi Jack,

I think I can help with these questions:

  • For the first issue, we've done a little digging and it looks like a feature flag issue. We're working on having this updated for you to fully enable the BigQuery functionality.
  • re: your second question - Although we don’t use Simba behind the scenes, we do use another driver (from Google) which is almost identical. We don’t currently support using a key-file for authentication.

Please let me know if this is helpful and if you try again, if you're successful in connecting to BigQuery.

Thanks!

Hi @JoshKF 

I just tried and it's working now so thanks for enabling the feature. I still have a couple of questions though.

 

1) This time, when clicking on "Test connection" I see a button named "Sign in using Google" - I guess this is the enabled feature. But Google is warning against signing in from DataRobot in a popup-window (expanded Advanced button):

Google hasn’t verified this app
The app is requesting access to sensitive info in your Google Account. Until the developer (oauth@datarobot.com) verifies this app with Google, you shouldn't use it.

Hide Advanced
Continue only if you understand the risks and trust the developer (oauth@datarobot.com).

Go to datarobot.com (unsafe)

By clicking "Go to datarobot.com (unsafe)" I'm allowed to the next page which says something like "datarobot.com wants access to your Google account" and asks permission to view BigQuery-data and managing data in Google Cloud (all that is fair enough). I am unsure if this app verification is something that should be done within our Google Cloud-project or if it's something that you - DataRobot - should do? It would increase trust between the two websites.

2) I've tried creating a couple of projects and I can now see my BigQuery tables in DataRobot. But how is table size determined on your side? I've selected a few tables which I believe are way below the standard CSV-file size limit of 2 GB yet they are rejected due to table exceeding file size, see below. 

 

TableNameBigQuery __TABLES__.size_bytes*Physical file sizeDataRobot size in bytesResult
random_table_11.593.548.7911.244.365.169 bytes2287 MBError: "Dataset with size 2287 MB exceeds the download limit of 2048 MB."
random_table_21.585.121.517N/A2075 MBError: "Dataset with size 2075 MB exceeds the download limit of 2048 MB."

 

*: __TABLES__ is a BigQuery metadata table.

 

Thanks.

0 Kudos

Hi @jlee As these questions require information specific to your account, they they both look like good questions for DataRobot Support to review with you as they will have your account information to hand (such as the cloud server you are using, data ingest limits on your account etc). I would suggest sending these questions to support@datarobot.com and then they can assist you from there.
0 Kudos