This walkthrough shows you how to investigate a human resource model in order to remove gender discrimination from it.
We are going to use data from a hypothetical large and successful manufacturer of modern high-tech products, “Megacorp.” Whenever this corporation advertises new job vacancies, its human resources team is overwhelmed by the number of people applying for a particular role.
They know from experience that while unstructured interviews consistently receive the highest ratings for perceived effectiveness from hiring managers, dozens of studies have found them to be among the worst predictors of actual on-the-job performance.
The company wants an automated process to filter through the resumes, to give them a short list of applicants who best match the advertised position and should thus be invited to be interviewed.
Megacorp has a database containing the resumes and hiring results of applicants from the past few years. The database includes variables such as age, gender, education, etc. for job applicant profiles. In addition, Megacorp wants to use the text from the resumes, including participation in extracurricular activities, to automate the process of picking the top candidates.
The new Human Resources Manager has prioritized the need for greater gender diversity at Megacorp. She has heard that if a model that predicts best candidates to call for an interview is built from biased data, it will produce biased predictions.
In this demo, we are going to use DataRobot to build the aforementioned models with an eye towards avoiding gender bias. We are going to do this in four steps:
Step 1. Build a naive model that includes all features that Megacorp has for historic candidates. This model will demonstrate how to use DataRobot’s model insights tools to highlight ways in which direct gender discrimination was included in the naive model.
Step 2. Improve on the naive model by removing the gender variable and creating a new model. Then, deploy the new model and later use it to score new female applicants with similar characteristics to the applicants used to build this model. This step will help us evaluate the effect of removing the gender variable from the naive model.
Step 3. Attempt to improve on the model that excludes the gender variable by identifying other features in the dataset that also encode an individual’s gender. We will identify those features by building another model that uses the original dataset but includes the gender variable as the target. Then, use the Feature Impact tab to observe what features are highly predictive of gender.
Step 4. Having identified all the features that encode gender in the original dataset, we build our final model with those features removed. This should be our best model that doesn’t contain any features that either directly or indirectly encode an applicant’s gender
The data we are going to use to build the naive model contains information about past applicants such as whether or not they were hired, their age, gender, education level, profile information, content about previous jobs, extracurricular activities, and so on (Figure 1). The target variable for this model is Hired. The goal is to use this data to build a naive model that will predict the probability that an applicant will be hired. These probabilities will be used to rank future applicants and call the top 20% for an interview.
Figure 1. Sample data used to build the naive model
After building the model for the first step, we use the Feature Impact tab of the best performing model to examine which features most influence the performance of this model. The Feature Impact tab shows 11 features and one of them is Gender (Figure 2). All things being equal, the Partial Dependence plot of the Gender variable in the Feature Effects tab shows that men are twice as likely to get hired than women (Figure 3). We need to remove Gender from the feature list in order to avoid this direct gender discrimination in our model.
Figure 2. Feature Impact tab of the naive model
Figure 3. Feature Effects tab of the naive model with the partial dependence plot of the gender variable.
We can do this by using the Create Feature List tab under the Data tab, selecting all features, and removing the gender feature (Figure 4). Then, restart Autopilot and deploy the best model. Now we need to retrieve a set of new female applicants from our recruiters (Figure 5). The new applicants have similar profiles to the male resumes that were previously submitted. We will run these against the improved model to see which of the new female candidates should be called for interviews.
Figure 4. Using the Create Feature List tab to remove the gender variable
Figure 5. A sample of the new female applicants to be scored
Looking at the performance of the deployed model, it seems the acceptance rate has changed suggesting that the target variable has some data drift going on (Figure 6). This was not expected, as the recruiters were submitting female resumes that were very similar to the male resumes they had previously submitted, and gender had been removed from the model.
Figure 6. The target variable "Hired" shows Feature Drift
One explanation is there is at least one more feature that is highly predictive of gender in the feature list that excluded the gender variable. In order to validate this assumption we are going to build a new model that uses the same features as our initial model, but predicts the gender variables instead of the hired variable. This will allow us to observe all features in this dataset that are predictive of the gender variable.
In the Feature Impact tab of the latest model we see that the text feature ExtraCurriculars has the biggest influence on model performance compared to all other features (Figure 7). The Word Cloud for this text feature shows terms such as mens, mens debating, football hockey, rowing, and rowing being highly associated with being male, while terms such as women, netball, softball, or womens tennis being highly associated with being female (Figure 8). This feature seems to act as a proxy for gender and needs to be either removed or modified in our original dataset.
Figure 7. Feature Impact tab of model with "Gender" as the target variable
Figure 8. Word Cloud of the "ExtraCurriculars" feature
At this point we have two features we need to remove from the original dataset in order to avoid building a biased model: Gender and ExtraCurriculars. The latter feature can still be useful if we transform it into new features that retain some of the information encoded in it. To this end, we will create three new features to replace it:
Total sports activities declared by each candidate (numSports)
Total debating activities declared by each candidate (numDebates)
Total of all activities that the candidate has declared (numActivities)
After creating the new dataset, we use it to kick off a new project. Interrogating the best performing model shows that we have a model that is free of gender bias. On top of that, the model is able to successfully rank and separate the bottom 20% of job applicants who are very unlikely to get the job. It also successfully ranks the top 20% of job applicants who are twice as likely to get the job compared to an average applicant (Figure 9).
Figure 9. The Lift Chart of the final model
The FeatureImpact tab of this model shows that education level, number of internships, number of extracurricular activities, and distance from home to office are among the most important features for the final model (Figure 10); we also believe the importance of these features is reasonable for this model.
Figure 10. Feature Impact tab of the final model
Drilling down into the Feature Effects tab for EducationLevel, Internships, and numActivities for instance, tells a story that logically makes sense. The higher the level of education an applicant has, the more likely they are to be hired, particularly the difference between a high school education and an undergraduate degree (Figure 11). This makes sense for a high-tech employer like Megacorp.
Figure 11. The partial dependence plot of the "Education" variable
On the other hand, the more internships an applicant has done, the higher their chances of being hired. But the key driver of success is merely whether the applicant has done any internships at all (Figure 12). Having extracurricular activities makes an applicant more successful; however, there’s no point in doing any more than one extracurricular activity, and it doesn’t matter whether it is a sport or the debate club (Figure 13).
Figure 12. The partial dependence plot of the "Internship" variable
Figure 13. The partial dependence plot of the "numActivities" variable
All of these insights suggest that the model is ready to be integrated into Megacorp’s business process.