Let’s assume you have reviews or tweets in your model. You believe you can get an uplift for your model if only you can capture the sentiment in the text. However, you can actually do this simply by modifying the blueprint.
For instance, this is a simple blueprint. For simplicity, I’m showing a blueprint with only text, but the model would have features too. Now you need to copy and edit your blueprint.
If you were to hover over either the Matrix of work-grams counts… node or Elastic-Net Classifier… node in my blueprint (below), you would see the type of input required for that task, and the type of output returned. For example, the Matrix of work-grams counts… task requires the input to be of type Text, and it returns a data frame with all numeric features (as shown in the image).
For this tip, we are capturing sentiments in the text feature. I do this by hovering over the Text variables node and clicking +.
In the displayed menu, I expand Preprocessing > Text Preprocessing and see a large number of options for text manipulations. (Some of these are already in Advanced Tunings for some blueprints, but others can be accessed only here.) For this tip, you select to add TextBlob Sentiment Featurizer.
You’ll see a new node in your blueprint, outlined in red. When you hover over the node, you can see that it requires text so we’ll connect that to the Text variables node; however, it returns a data frame with numerical features. The TextBlob Sentiment Featurizer is a preprocessing module, so we need to connect it to our model task, Elastic-Net Classifier…
Your new blueprint is set. Now you just need to train the blueprint to build a new model with this new blueprint. (Before training, you can change the feature list or the training sample size.)
Here is the model on the Leaderboard, shown in the top four best models. For the best model, which we trained on 80%, we added SPACY from the Test Preprocessing.
Is there a way to extract values of the intermediate features (model outputs) you are adding to the blueprint?
Thank you for posting an informative and clear instructions. Since DataRobot doesn't have a dedicated `Text` mode I am trying to figure out how to get sentiments/topics from my inputs.
The Clustering option in unsupervised mode does not show in my DataRobot interface and there doesn't appear to be a way to predict sentiment using pretrained models. My project needs to have the sentiment, clusters, and prediction for each record.