The Data Quality Assessment mentions that I have no inliers. I have a few questions on inliers:
Solved! Go to Solution.
I see a helpful explanation in the on-line help from here https://app2.datarobot.com/docs/data/data-mgmt/data-analysis/data-quality.html#inliers
Inliers are values that are consistent with the bulk of the data, but wrong for a particular row (for example, a car rental company using a local zip code for an international customer). If not handled, they could negatively affect model performance.
How they are detected: For each value recorded for a feature, DataRobot computes the value's frequency for that feature and makes an array of the results. Inlier candidates are the outliers in that array. To reduce false positives, DataRobot then applies another condition, keeping as inliers only those values for which:
frequency > 50 * (number of non-missing rows in the feature) / (number of unique non-missing values in the feature)
The algorithm allows inlier detection in numeric features with many unique values where, due to the number of values, inliers wouldn’t be noticeable in a histogram plot. Note that this is a conservative approach for features with a smaller number of unique values. Additionally, it does not detect inliers in features with fewer than 50 unique values.
How they are handled: A binary column is automatically added inside of a blueprint to flag rows with inliers. This allows the model to incorporate possible patterns behind abnormal values. No additional user action is required.