FAQs: Setting Up Models

This section provides answers to frequently asked questions related to setting up modeling. If you don't find an answer for your question, you can ask it now; use Post your Comment (below) to get your question answered.

 

Is it possible to do feature transformation in DataRobot?

Yes, DataRobot provides a number of ways for you to transform your features within the platform. You can apply transformations such as Log(x) and x^2, and you can create custom transformations using the f(x) transform option. Custom transformations allow you to create new variables that are a function of other variables in your data.

Is it possible to do feature transformations in DataRobot.png

 

More information for DataRobot users: search in-app documentation for Feature transformations.

Can I specify which target value will be used as the positive class in DataRobot?

For a binary classification problem, you can choose what target value is assigned a positive class inside the Advanced Options tab under the Additional tab. There, you can make your Positive Class Assignment as shown in the image below:

Can I specify which target will be used as the positive class.png

 

More information for DataRobot users: search in-app documentation for Show Advanced Options link, then locate information for “Positive class assignment (binary classification only)."

Can I define the optimization metric myself?

DataRobot chooses from a comprehensive set of metrics and recommends one well-suited for your data. If you would like to use a different metric, DataRobot provides a set of optimization metrics in the Advanced Options tab under Additional as shown in the image below. You can also share with Support or your CFDS any metric that you would like to see implemented in DataRobot.

Can I define the optimization metric myself.png

 

 More information for DataRobot users: search in-app documentation for Optimization metrics.

Do I have to fix class imbalance on the dataset before loading my data into DataRobot?

Class imbalance is an issue only if we evaluate models using simple metrics like percentage accuracy. DataRobot directly optimizes models for objectives that are both aligned with the project metric and robust to imbalanced targets. A good example of this is the use of LogLoss as an optimization metric. LogLoss is known to be robust to imbalanced datasets.

More information for DataRobot users: search in-app documentation forOptimization metrics, then locate information for "LogLoss / Weighted LogLoss."

How do I force a feature to have a monotonic relationship with the target?

To do this, create a feature list containing the features you to be monotonically increasing (and a separate list for features that you want to be monotonically decreasing, if needed). Then, specify those feature lists in Advanced Options > monotonicity constrain. These feature lists will be a subset of the feature list you use for modeling.

How do I force a feature to have a monotonic.png

 

More information for DataRobot users: search in-app documentation forMonotonic constraints or Monotonic modeling considerations.

What data partition is used in the histograms on the Data tab?

Histograms produced via the first Exploratory Data Analysis (i.e., right after DataRobot is done uploading your data) use all of the data, up to 500MB. For datasets larger than 500MB, a random 500MB sample is used. For histograms produced by the second Exploratory Data Analysis process (i.e., after you have told DataRobot what the target variable is), all the data except for the holdout (if any) and rows with missing target values are used.

What data partition is used in the histograms on the Data tab.png

More information for DataRobot users: search in-app documentation for Feature details page, then locate information for "Working with the Histogram chart."

What does DataRobot do if there are missing values in my target?

During Exploratory Data Analysis (EDA) and modeling, DataRobot ignores records with missing values in the target.

Can I share my project for others to view and/or work on?

Yes, when you share a project (from Manage Projects control center) you can make the target an owner, a user, or an observer. Observers can only observe and users can do everything owners can do except delete the project or unlock the holdout.

Can I share my project for others to view and_or work on.png

 

More information for DataRobot users: search in-app documentation for Create and Manage projects.

How do I delete a project?

Note: On Managed AI Cloud, deleted projects cannot be recovered. For On-Premise AI Cluster, Private AI Cloud, Hybrid AI Cloud deployments, deleted projects can be recovered only by the system administrator.

Click the folder icon on the top right corner as shown in the image below.

How do I delete a project 1.png

Then click Manage Projects to see a list of projects.

How do I delete a project 2.png

Select the dropdown menu in the Actions column for the project you want to delete.

How do I delete a project 3.png

Click Delete Project from that menu. When prompted for confirmation, click Delete.

How do I delete a project 4.png

Do I have to upload my data again if I want to start my project over from the beginning?

No, you can easily make a copy of the project from the manage projects page. Click the folder icon on the top right corner.

Do I have to upload 1.png

Then click Manage Projects to see a list of projects. This will take you to the Projects page.

Do I have to upload 2.png

 

On the Projects page, select the dropdown menu in the Actions column for the project you want to copy.

Do I have to upload 3.png


Click Copy Project from that menu. DataRobot will make a new instance (a clone) of the project and take you back to the setup screens. Any feature lists created in the original project persist in the cloned project.

Do I have to upload 4.png


How can I apply weights to my data?

Weighting data instances (rows) is a way to assign a relative importance to each of them in model fitting. To apply weights to your data, add a new column of weights to the dataset, with each weight being a number greater than zero. Name the column clearly, for example, "PriorityWeight."

Here’s how: After you import your data and select a target variable, the Data page appears. From this page you can click Show Advanced Options to access the advanced modeling parameters.

How do I apply weights 1.png

Under Additional, you will find Weight. Enter the name of the feature containing the weight information, such as "PriorityWeight."


How do I apply weights 2.png

What do the "few values," "duplicate," etc. prefixes on feature names mean?

These informational tags identify feature characteristics discovered by DataRobot in its Exploratory Data Analysis (EDA) process. Features with these tags are deemed uninformative, and the gray text in front of the feature name describes the reason the feature was found to be uninformative. These features are excluded from the list of Informative Features that DataRobot creates.

What do the few values.png

More information for DataRobot users: search in-app documentation for Feature lists.

Does DataRobot detect reference IDs in datasets?

DataRobot attempts to detect reference IDs in datasets. If found, these features will be labeled with an informational tag in the Data page. For smaller datasets (typically those with fewer than 2000 records), attempting to automatically identify Reference ID columns can lead to false positives (i.e., incorrectly labeling columns as Reference ID). Therefore, especially with smaller datasets, you should manually preprocess the data to remove reference IDs or create a feature list that excludes them.

Does DataRobot detect reference IDs.png

More information for DataRobot users: search in-app documentation for Overview and EDA, or search for Feature lists and then locate information for "Data page informational tags."

What do the histograms in the Data page represent?

In the Data page, for each feature in the dataset there is a histogram which shows the number of rows of data that have a specific feature value. For categorical features, the height of the bar indicates the number of rows of data which have that feature value. For numeric features, the values are grouped into ranges (bins), and the histogram shows the number of rows for which the feature has a value within the range of that bin.

What do the histograms in the Data page represent.png

What does the small 'i' symbol in the feature list signify?

This symbol indicates that the feature has been derived, either by you or by DataRobot. DataRobot automatically derives date-related features from dates, e.g., day of week, month of year, etc, and these are indicated with the ‘i’ symbol. If you create a new feature (by var type transform or create f(x) transform), the ‘i’ symbol identifies these features as well.

What does the small i symbol.png

Data preparation is a large part of my job. How does DataRobot help me with these tasks?

DataRobot starts by performing Exploratory Data Analysis (EDA) on your data. This includes activities such as encoding variables, cleaning up missing values, transforming features, identifying potential target leakage, searching for interactions, identifying non-linearities, and so forth. Preparation tasks such as merging multiple data sources into a single dataset can be accomplished with the DataRobot Paxata data prep tools.

More information for DataRobot users: search in-app documentation for Overview and EDA. Also, you can find information for DataRobot Paxata data prep in the DataRobot Paxata Official Cloud Documentation.

How does DataRobot handle Natural Language Processing (NLP)?

DataRobot supports a wide array of Natural Language Processing (NLP) tasks. When text fields are detected in your data, DataRobot automatically detects the language and applies appropriate preprocessing. This may include advanced tokenization, data cleaning (stop word removal, stemming, etc.), and vectorization methods. DataRobot supports n-gram matrix (bag-of-words, bag-of-characters) analysis as well as word embedding techniques such as Word2Vec and fastText with both CBOW and Skip-Gram learning methods. Additional capabilities include Naive Bayes SVM and cosine similarity analysis. For visualization, there are per-class word clouds for text analysis. DataRobot is continuously expanding the NLP capabilities.

More information for DataRobot users: search in-app documentation for Coefficients (and preprocessing details).

This section provides answers to frequently asked questions related to setting up modeling. If you don't find an answer for your question, you can ask it now; use Post your Comment (below) to get your question answered.

Can I control how to group or partition my data for model training?

Yes. After you load data and select a target, you can choose Advanced Options (at the bottom of the Data page). From the displayed Advanced Options page, you can set the size of the Training, Validation, and Holdout data partitions. You can also set the number of partitions for cross-validation and the method by which those partitions are created. The default partitioning method is “random” for regression and “stratified” for classification, but other appropriate partitioning methods are possible. For time-dependent data, you can select Date/Time partitioning (aka OTV or Out of Time Validation). Column-based partitioning (Partition Feature) or Group partitioning (Group) can be used to create a more deterministic partitioning method.

Can I control how to group or partition my data for model training.png

How does DataRobot handle imbalanced data (class imbalance)?

DataRobot has a number of guardrails to make sure that imbalanced data is treated appropriately. One guardrail is a set of metrics which are robust even when the target variable is imbalanced. Some of those metrics are: LogLoss and the Matthews Correlation Coefficient (MCC). You can find the LogLoss metric and Max MCC in the Metrics dropdown menu on the Leaderboard. MCC is also found in the ROC Curve tab.

How does DataRobot handle.png

For more information, see this Imbalanced Data DataRobot Community post.

How do I set exposures and offsets?

Offsets and exposures are commonly used for insurance loss modeling. They are treated as special features in data analysis and prediction. You can add exposures and offsets to your data from the Advanced Options page.

How do I set exposures and offsets.png

More information for DataRobot users: search in-app documentation for Show Advanced Options link, then locate information for “Using Exposure, Offset, and Count of Events.”

What do the green "importance" bars represent on the Data tab?

The importance bars show the degree to which a feature is correlated with the target. These bars are based on ACE scores, or "Alternating Conditional Expectations" scores. ACE scores are capable of detecting non-linear relationships with the target, but as they are univariate they are unable to detect interaction effects between features.

What do the green importance bars.png

More information: see the paper, “Estimating Optimal Transformations for Multiple Regression Using the ACE Algorithm”.

What do Autopilot, Quick, and Manual modes do?

These modeling modes determine which blueprints are run and how much of the data is used. In Manual mode, you choose which specific blueprints are used. In Quick mode, a subset of the Autopilot blueprints are run against 32% and 64% of the data. In Autopilot mode, DataRobot selects and runs the best predictive blueprints given the distribution of the target variable and all the other variables in your data; it does this in a survival-of-the-fittest mode. The modeling mode dropdown menu is located beside Modeling Mode, just below the Start button.

What do Autopilot.png

More information for DataRobot users: search in-app documentation for Modeling workflow, then locate information for “Setting the modeling mode.”

What happens to my deployment if I delete a model that it is using?

DataRobot will not allow you to delete a model that is deployed.

Can I derive a new feature in DataRobot?

Yes, you can create new features that are derived from others in your data. To create a new derivative feature, click on the symbol to the left of the feature name in the Data page. Select f(your feature name here), and enter a formula to be used to calculate the new feature’s value.

Can i derive a new feature in DataRobot.png


More information for DataRobot users: search in-app documentation for Feature Transformations.

What Exploratory Data Analysis does DataRobot do?

DataRobot automatically performs Exploratory Data Analysis (EDA) on datasets that you load. The type of EDA depends on the size of the file. EDA happens in phases.

  • The first phase of EDA happens when the data is initially ingested, and the analysis is done on the full dataset (or a 500MB sample if dataset > 500MB). This phase determines feature type, summary statistics, and frequency distribution for top 50 items, and also identifies informative features.
  • The second phase of EDA is done on the same dataset as the first one, but it excludes holdout and any rows where the target is missing. DataRobot recalculates summary statistics and computes ACE scores during the second EDA.

More information for DataRobot users: search in-app documentation for Overview and EDA.

What is the default partitioning used in DataRobot?

By default, DataRobot splits the data into 20% holdout (test) and 80% over five-fold cross-validation (training and validation). These values can be changed in the Advanced Options > Partitioning page.

What is the default partitioning used in DataRobot.png

DataRobot is suggesting regression, but can I force it to do classification?

Yes. Below the window in which you specify the target, simply click Switch To Classification. This option will only be enabled if the numeric feature has no more than 100 unique values.

DataRobot is suggesting regression.png

How do I delete a model?

From the Leaderboard: Select the model you wish to delete by clicking the box to the left of the model name, and then click Menu from above the list.

How do I delete a model 1.png


Clicking Menu will open a dropdown menu. Click Delete Selected Models.

How do I delete a model 2.png

Confirm the deletion.


How do I delete a model 3.png


What does the yellow triangle warning indicate?

It indicates that the feature is at risk of target leakage. Target leakage means that the feature may not be known at prediction time (as the feature seems to be too good a predictor of the target). When it is first detected, you will see a detailed warning message:

What does the yellow i warning indicate.png

Do I have to use the GUI or can I interact programmatically?

You are not limited to the GUI. Using the R and Python clients, you can do almost everything that you can do via the GUI. DataRobot also provides a REST API.

More information: DataRobot Python client documentation or DataRobot R client documentation.

More information for DataRobot users: see API documentation at the DataRobot Support site.

Does DataRobot integrate with Excel?

Yes, DataRobot licensed users can install an Excel add-in to help harness the power of DataRobot within a familiar Excel environment. The DataRobot Excel Add-In supports Microsoft Excel client-installed (not cloud-based) Windows versions 2010 through Office 365. It requires .NET version 4.6.1. If you have the dependency installed, you can simply run the .msi installer; otherwise, run setup.exe first to install all required dependencies, then run the installer.

More information for DataRobot users: search in-app documentation for Excel Add-in.

Does DataRobot integrate with Alteryx?

Yes, DataRobot Tools allow you to create projects and make predictions without leaving the Alteryx interface.

You can download Alteryx from here: https://s3.amazonaws.com/datarobot-public-external-connectors/DataRobotTools.yxi.

Once the download completes, double-click the file and follow the instructions to install.

More information for DataRobot users: search in-app documentation for Tools for Alteryx.

Does DataRobot integrate with Tableau?

Yes, the DataRobot extensions for Tableau, downloadable from the Tableau Extensions Gallery, are configured to work with DataRobot Cloud. If your organization runs On-Premise AI Cluster, Private AI Cloud, Hybrid AI Cloud, or EU Managed AI Cloud, you must change the extension configuration to work with your deployment.

More information for DataRobot users: search in-app documentation for Modifying the Tableau Extension URL.

Does DataRobot have ETL Capabilities?

DataRobot has expanded its ETL (Extract-Transform-Load) capabilities with the acquisition and integration of Paxata. Please contact your DataRobot representative for more information.

More information: DataRobot press release, "DataRobot Acquires Paxata to Bolster its End-to-End AI Capabilities."

Also, you can find information from DataRobot Paxata Official Cloud Documentation.

What is the difference between prediction and modeling servers?

Modeling servers power all the analysis you do from the GUI and from R/Python clients. Modeling worker resources are typically used to build models, hence they are called "modeling workers." Prediction Servers are used solely for making predictions and handling prediction requests on deployed models. These separate, stand-alone resources ensure that a queue for different request types and worker types becomes a bottleneck in your AI processes. If your deployed model makes real-time predictions, utilizing a dedicated prediction server will ensure its performance.

More information for DataRobot users: search in-app documentation for Standalone Prediction Server.

How does DataRobot interact with SAS? What can I do with my SAS models?

DataRobot can import a SAS file directly (*.sas7bdat). You can also call DataRobot via the API from within SAS by using Proc HTTP.

What file types can DataRobot ingest?

DataRobot can ingest text, excel, sas, and various zipped files. Specifically, it can ingest the following file types: .csv, .tsv, .dsv, .xls, .xlsx, .sas7bdat, .geojson, .bz2, .gz, .zip, and .tgz.

What types of data can DataRobot ingest.png

More information for DataRobot users: search in-app documentation for Overview and EDA.

What sources can DataRobot ingest from?

DataRobot can ingest data from a JDBC-enabled data source, URL, Hadoop Distributed File System, a local file, or from a DataRobot AI Catalog.

What sources can DataRobot ingest from.png

 

More information for DataRobot users: search in-app documentation for Overview and EDA.

Labels (3)
Version history
Revision #:
13 of 13
Last update:
‎05-27-2020 08:58 AM
Updated by:
 
Contributors