Deploying a Model to Hadoop

This article showcases how you can score big datasets accessed from HDFS using an in-memory model created by DataRobot (Note that in-place scoring on Hadoop is not available for Managed AI Cloud deployments.)

Scoring Data on Hadoop

DataRobot allows you to perform distributed scoring using a DataRobot-built model from within the Deploy to Hadoop tab.

Figure 1. Deploy to Hadoop tabFigure 1. Deploy to Hadoop tab

As you can see in Figure 1, DataRobot will ask for the input and output files and will then generate a datarobot-scoring command on a specified Hadoop host. This command allows you to run the model on huge datasets without worrying about the network congestion that would occur if you were to move this data around your network or send it through a POST request.

Advanced Options

If you want to change the Spark job settings, click the Advanced options toggle. This will give you the opportunity to manually tune which resources this job will require.

Finally, you can use the datarobot-scoring command directly from the command line or set up an oozie job to schedule a time-based execution of the model. Be aware that when using the GUI, the downloading the model (file) is handled by DataRobot so there are a few extra steps involved like downloading the .drx file from the Downloads tab if you want to run this directly from the command line.

More Information

If you’re a licensed DataRobot customer, search the in-app Platform Documentation for Deploy to Hadoop tab and Using Hadoop Scoring from the command line.

Labels (3)
Version history
Revision #:
4 of 4
Last update:
‎05-08-2020 11:05 AM
Updated by:
 
Contributors