HR Bias

This walkthrough shows you how to investigate a human resource model in order to remove gender discrimination from it.

[video]

We are going to use data from a hypothetical large and successful manufacturer of modern high-tech products, “Megacorp.” Whenever this corporation advertises new job vacancies, its human resources team is overwhelmed by the number of people applying for a particular role.

They know from experience that while unstructured interviews consistently receive the highest ratings for perceived effectiveness from hiring managers, dozens of studies have found them to be among the worst predictors of actual on-the-job performance.

The company wants an automated process to filter through the resumes, to give them a short list of applicants who best match the advertised position and should thus be invited to be interviewed.

Megacorp has a database containing the resumes and hiring results of applicants from the past few years. The database includes variables such as age, gender, education, etc. for job applicant profiles. In addition, Megacorp wants to use the text from the resumes, including participation in extracurricular activities, to automate the process of picking the top candidates.

The new Human Resources Manager has prioritized the need for greater gender diversity at Megacorp. She has heard that if a model that predicts best candidates to call for an interview is built from biased data, it will produce biased predictions.

In this demo, we are going to use DataRobot to build the aforementioned models with an eye towards avoiding gender bias. We are going to do this in four steps:

Step 1. Build a naive model that includes all features that Megacorp has for historic candidates. This model will demonstrate how to use DataRobot’s model insights tools to highlight ways in which direct gender discrimination was included in the naive model.

Step 2. Improve on the naive model by removing the gender variable and creating a new model. Then, deploy the new model and later use it to score new female applicants with similar characteristics to the applicants used to build this model. This step will help us evaluate the effect of removing the gender variable from the naive model.

Step 3. Attempt to improve on the model that excludes the gender variable by identifying other features in the dataset that also encode an individual’s gender. We will identify those features by building another model that uses the original dataset but includes the gender variable as the target. Then, use the Feature Impact tab to observe what features are highly predictive of gender.

Step 4. Having identified all the features that encode gender in the original dataset, we build our final model with those features removed. This should be our best model that doesn’t contain any features that either directly or indirectly encode an applicant’s gender

The data we are going to use to build the naive model contains information about past applicants such as whether or not they were hired, their age, gender, education level, profile information, content about previous jobs, extracurricular activities, and so on (Figure 1). The target variable for this model is Hired. The goal is to use this data to build a naive model that will predict the probability that an applicant will be hired. These probabilities will be used to rank future applicants and call the top 20% for an interview.

Figure 1. Sample data used to build the naive modelFigure 1. Sample data used to build the naive model

After building the model for the first step, we use the Feature Impact tab of the best performing model to examine which features most influence the performance of this model. The Feature Impact tab shows 11 features and one of them is Gender (Figure 2). All things being equal, the Partial Dependence plot of the Gender variable in the Feature Effects tab shows that men are twice as likely to get hired than women (Figure 3). We need to remove Gender from the feature list in order to avoid this direct gender discrimination in our model.

Figure 2. Feature Impact tab of the naive modelFigure 2. Feature Impact tab of the naive model

Figure 3. Feature Effects tab of the naive model with the partial dependence plot of the gender variable.Figure 3. Feature Effects tab of the naive model with the partial dependence plot of the gender variable.

We can do this by using the Create Feature List tab under the Data tab, selecting all features, and removing the gender feature (Figure 4). Then, restart Autopilot and deploy the best model. Now we need to retrieve a set of new female applicants from our recruiters (Figure 5). The new applicants have similar profiles to the male resumes that were previously submitted. We will run these against the improved model to see which of the new female candidates should be called for interviews.

Figure 4. Using the Create Feature List tab to remove the gender variableFigure 4. Using the Create Feature List tab to remove the gender variable

Figure 5. A sample of the new female applicants to be scoredFigure 5. A sample of the new female applicants to be scored

Looking at the performance of the deployed model, it seems the acceptance rate has changed suggesting that the target variable has some data drift going on (Figure 6). This was not expected, as the recruiters were submitting female resumes that were very similar to the male resumes they had previously submitted, and gender had been removed from the model.

Figure 6. The target variable "Hired" shows Feature DriftFigure 6. The target variable "Hired" shows Feature Drift

One explanation is there is at least one more feature that is highly predictive of gender in the feature list that excluded the gender variable. In order to validate this assumption we are going to build a new model that uses the same features as our initial model, but predicts the gender variables instead of the hired variable. This will allow us to observe all features in this dataset that are predictive of the gender variable.

In the Feature Impact tab of the latest model we see that the text feature ExtraCurriculars has the biggest influence on model performance compared to all other features (Figure 7). The Word Cloud for this text feature shows terms such as mens, mens debating, football hockey, rowing, and rowing being highly associated with being male, while terms such as women, netball, softball, or womens tennis being highly associated with being female (Figure 8). This feature seems to act as a proxy for gender and needs to be either removed or modified in our original dataset.

Figure 7. Feature Impact tab of model with "Gender" as the target variableFigure 7. Feature Impact tab of model with "Gender" as the target variable

Figure 8. Word Cloud of the "ExtraCurriculars" featureFigure 8. Word Cloud of the "ExtraCurriculars" feature

At this point we have two features we need to remove from the original dataset in order to avoid building a biased model: Gender and ExtraCurriculars. The latter feature can still be useful if we transform it into new features that retain some of the information encoded in it. To this end, we will create three new features to replace it:

  • Total sports activities declared by each candidate (numSports)
  • Total debating activities declared by each candidate (numDebates)
  • Total of all activities that the candidate has declared (numActivities)

After creating the new dataset, we use it to kick off a new project. Interrogating the best performing model shows that we have a model that is free of gender bias. On top of that, the model is able to successfully rank and separate the bottom 20% of job applicants who are very unlikely to get the job. It also successfully ranks the top 20% of job applicants who are twice as likely to get the job compared to an average applicant (Figure 9).

Figure 9. The Lift Chart of the final modelFigure 9. The Lift Chart of the final model

The Feature Impact tab of this model shows that education level, number of internships, number of extracurricular activities, and distance from home to office are among the most important features for the final model (Figure 10); we also believe the importance of these features is reasonable for this model.

Figure 10. Feature Impact tab of the final modelFigure 10. Feature Impact tab of the final model

Drilling down into the Feature Effects tab for EducationLevel, Internships, and numActivities for instance, tells a story that logically makes sense. The higher the level of education an applicant has, the more likely they are to be hired, particularly the difference between a high school education and an undergraduate degree (Figure 11). This makes sense for a high-tech employer like Megacorp.

Figure 11. The partial dependence plot of the "Education" variableFigure 11. The partial dependence plot of the "Education" variable

On the other hand, the more internships an applicant has done, the higher their chances of being hired. But the key driver of success is merely whether the applicant has done any internships at all (Figure 12). Having extracurricular activities makes an applicant more successful; however, there’s no point in doing any more than one extracurricular activity, and it doesn’t matter whether it is a sport or the debate club (Figure 13).

Figure 12. The partial dependence plot of the "Internship" variableFigure 12. The partial dependence plot of the "Internship" variable

Figure 13. The partial dependence plot of the "numActivities" variableFigure 13. The partial dependence plot of the "numActivities" variable

All of these insights suggest that the model is ready to be integrated into Megacorp’s business process.

More Information

Check out the community article, Understanding Models Overview. 

If you’re a licensed DataRobot customer, search the in-app Platform Documentation for Feature transformations, Feature Impact, or Feature Effects.

Labels (3)
Comments
NiCd Battery

Very appropriate with what is happening right now! Excellent use case

Version history
Revision #:
9 of 9
Last update:
‎06-08-2020 12:55 PM
Updated by:
 
Contributors