Exploratory Data Analysis (EDA)

When you upload your training data, DataRobot automatically performs Exploratory Data Analysis or EDA to understand the data.

In Figure 1, we have a dataset already uploaded. In the Data Quality Assessment report, we can see this dataset has 51 features and 10,000 records, and that no initial downsampling was done.

Figure 1. Data fields and Data Quality Assessment reportFigure 1. Data fields and Data Quality Assessment report

DataRobot provides the data type along with the number of unique and missing values for all fields. For numeric fields it also provides the mean, standard deviation, median, min, and max values.

Figure 2. Summary StatisticsFigure 2. Summary Statistics

When you click on a numeric field, DataRobot shows a histogram for that data. You are able to adjust the number of bins that are displayed so that you can see a more granular, or aggregated, histogram.

Figure 3. Field HistogramFigure 3. Field Histogram

You are also able to see the frequent values and their counts by clicking Frequent Values.

Figure 4. Frequent ValuesFigure 4. Frequent Values

If you click Table, DataRobot will present the field in a tabular format, sorting it by the “Count” in descending order.

Figure 5. Table of valuesFigure 5. Table of values

When drilling down into a categorical field, you are able to view the frequent values and table formats. DataRobot also creates a bucket if your field has missing values, as seen in Figure 6.

Figure 6. Categorical valuesFigure 6. Categorical values

DataRobot also enables you to drill into text fields the same way as a categorical field.

Figure 7. Text valuesFigure 7. Text values

More Information

If you’re a licensed DataRobot customer, search the in-app Platform Documentation for Overview and EDA and Time series modeling.

Version history
Revision #:
3 of 3
Last update:
‎07-01-2020 05:26 PM
Updated by:
 
Contributors