Using AI to understand the student loan problem

Showing results for 
Search instead for 
Did you mean: 

Using AI to understand the student loan problem

Originally posted on 5/16/16


In March of 2016, the US Department of Education released its latest College Scorecard to “provide insights into the performance of schools and the outcomes of students at those schools.” Fortunately for us data-driven strategists (read: nerds), the government also released the raw data it used to drive its summary results and findings.

While Washington’s number-crunchers did a nice job increasing transparency about each college’s strengths and lifetime earnings ROI, there was one angle that was noticeably absent given the election cycle: a deep-dive into loan repayment rates. How flawed are current loan costs, if at all, and what leads to students being unable to pay them off? We took this unanswered question as an opportunity to feed this freely available data to an AI-powered modeling engine and see what meaningful insights we could derive from the resulting models.


We put artificial intelligence to the test to automatically sift through the College Scorecard dataset to highlight the most important relationships and predictive variables that influence loan repayment. It’s important to note that we only used the quantitative inputs available to us, and consequently, variables like motivational drive, professional network, etc. will not show up in our models, despite potentially playing a significant role in a student’s ability to repay their loans.

Our focus will be on the Scorecard variable “Repayment Rate 7 Years from Graduation,” or the percentage of students able to make any contribution to their loans 7 years after graduating college. The first step in using Eureqa is simply formulating a question. In this case, we’ll ask: “What causes low repayment rates?”

Using Eureqa, we built a model to predict the likelihood a school will have a repayment rate below 80% after 7 years. More interestingly, we were able to quickly identify a few of the drivers ("features") of repayment.

After running Eureqa for five minutes, we found that repayment rate is:

The higher the family’s income, the more likely the student is to repay their loans.

  • Positively correlated with parent/guardian income—The higher the family’s income, the more likely the student is to repay their loans.
  • Negatively correlated with a school’s percentage of students on loans—The higher the proportion of a school’s students that are on loan programs, the less likely a student is to replay their loans.
  • Negatively correlated with a school’s percentage of non-white students—The higher a school’s proportion of non-white students, the less likely a student is to repay their loans.
  • Negatively correlated with a school’s acceptance rate—The higher a school’s acceptance rate, the less likely a student is to repay their loans.

Figure 1, below, shows the likelihood that students will default on their loans (y-axis) plotted against that student’s family income (x-axis). Default rate is remarkably high until family income hits about $60,000, and then it plummets. Let’s think about that for a second. If a family is making less than $50,000 per year, it’s more likely than not that their child will default on a loan payment and incur even more expenses as a penalty. For a lower- or middle-class family hoping to send its child to school to climb the economic ladder, the system, to put it mildly, is not doing them any favors.

loan graph.jpeg

Unfortunately, there’s a positive correlation between the percentage of a school’s students receiving Pell Grants and students’ likelihood of default. Schools with a higher ratio of Pell Grant recipients tend to experience higher rates of default on their loans, even though Pell Grants are intended as a direct subsidy to chip away at student expenses. This suggests that Pell Grants may not do enough to help students escape their debt.

How about another interesting finding. What role does faculty quality have on student success? There’s a linear relationship between faculty salary and graduation rates: The higher the average monthly salary of a school’s professors, the higher the percentage of students that graduate within six years. This could indicate that the highest-paying schools draw the best professors, who pass off a higher-quality work ethic and knowledge to their students. Or, of course, these variables could simply be correlated, and students who are more likely to graduate college from the very beginning follow the schools with the pricier professors.

If the Department of Education collected more multi-dimensional data about quantifiable loan rates, students, and their characteristics, we could go deeper into the causes of loan default, and potentially determine an “optimal” loan rate that considers the trade-off between student/societal value achieved from an affordable education, and the government’s ability to keep its loaning sustainable.

Labels (1)
Version history
Last update:
‎12-06-2019 03:43 PM
Updated by: