I need help accesing the local file system on Azure Databricks.
I tried to use this as an example:
and get stuck when trying to access local filesystem -- in this case, dbfs.
I get the error below:
Solved! Go to Solution.
I have not used dbfs. Although noting that it is an abstraction layer and per some of the examples in this article, it seems that using /dbfs/<path>/input.csv and /dbfs/<path>/output.csv may work? The DataRobot SDK does not understand the dbfs reference.
Note also there are scenarios where a model can be brought into a Spark environment to score data through a Spark dataframe as well.
Also note that I typically advise only keeping a surrogate key column (or columns) to join, and join the data back to the original dataset if desired. The passing through of all columns can take up compute time and certainly network time moving all the additional data around, although it is data you also already have in the client/source in many instances.