cancel
Showing results for 
Search instead for 
Did you mean: 

Training with only positive data

mklarkc
Blue LED

I have a data set of 100k or so entrepreneurs and their past positions.  I'm trying to use this data set to figure out what kind of jobs in the past are most likely to result in someone becoming a entrepreneur.

 

What's really challenging here is that there isn't a data set for those who don't become entrepreneurs because anyone can at any time so I'm looking for ways to train a model without any negatives.  I can only tell it what entrepreneurs' past looked like and then ideally have it tell me what the chances are if I feed it another persons past.

 

Anyone have any tips?

1 Reply
rshah
Data Scientist
Data Scientist

One approach you can use is look-alike modeling. This approach has other names as well from PU (positive-unknown) model or one-class classification. A short post using the technique is here: https://community.datarobot.com/t5/resources/predicting-covid-19-at-the-county-level/ta-p/2863 and

take a look at this lab in DataRobot University which goes through a similar problem: https://university.datarobot.com/covid-lab

You should be able to use the same approach