cancel
Showing results for 
Search instead for 
Did you mean: 

How to create snowflake datasource using python ?

Bruce
DC Motor

How to create snowflake datasource using python ?

The title says it all. 

 

I have tried a variety of approaches, which somewhat against form I won't expand on here. What I am looking for is a direct way to just create a project from a query in Snowflake, using the datarobot and snowflake python modules.

 

While I can open a connector and download data - I am stuck with downloading to csv and then creating the project from the csv. I want to create the project directly from the query, rather than via the csv.

 

Is there a standard simple and direct way to do this - for example using the connection to the Snowflake database? Or is this known to be a problem?

0 Kudos
6 Replies
dalilaB
Data Scientist
Data Scientist

I will check and get back to you.

0 Kudos
IraWatt
Laser

Hey @Bruce,

I haven't yet had time to test it but to achive this I would download snowflakes JDBC driver from here:

IraWatt_0-1652442456898.png

Then use it to create a Data Store Database Connectivity — DataRobot Python Client 2.28.0 documentation (readthedocs-hosted.com)

Then Query it creating a Data Source Database Connectivity — DataRobot Python Client 2.28.0 documentation (readthedocs-hosted.com)

Then create a project using that data source.

dalilaB
Data Scientist
Data Scientist

Here is a potential solution, you will notice the SQL button,
Screenshot 2022-05-13 at 6.00.09 PM.png

0 Kudos

 

Or in more detail, this is what I worked out and got working.

Not the actual code I used, use at your own risk, for entertainment purposes only.

 

 

driver = [dd for dd in dr.DataDriver.list() if dd.canonical_name=="Snowflake (3.13.9)"][0]

# -------------------

trainStore = dr.DataStore.create(
data_store_type = 'jdbc',
canonical_name=trainName,
driver_id = driver.id,
jdbc_url="jdbc:snowflake://xxxxxxxxxxxxxxxxxxxxx"
)

# -------------------

dataSource = dr.DataSource.create(
data_source_type="jdbc",
canonical_name=trainName,
params = dr.DataSourceParameters(
data_store_id = trainStore.id,
query = "select * from xxxxxxxxxxxxxxxxxxx")
)

# -------------------

sys.stdout.flush()
dataSource = dr.DataSource.create(
data_source_type="jdbc",
canonical_name=guessName,
params = dr.DataSourceParameters(
data_store_id = trainStore.id,
query = "select * from xxxxxxxxxxxxxxxxxx")
)

theDataset = dr.Dataset.create_from_data_source(
data_source_id = dataSource.id,
password = password,
username = username)

# --------------------------------------

project = dr.Project.create_from_data_source(
data_source_id=source.id,
username=username,
password=password,
project_name=pname
)

# -------------------

@IraWatt 

 

I thought I needed the JDBC driver as well, but actually that turns out to be a sys admin action. The drive already exists, and you find it in the Datarobot resources.

 

 

Nice job, @IraWatt 

0 Kudos