cancel
Showing results for 
Search instead for 
Did you mean: 

multi-label classification

Highlighted
Blue LED

How can I solve a multi-label classification problem (not multi-class) in datarobot?

In scikitlearn for example you can pass a vector with all the labels, but I don't know how to pass that to the datarobot, which expects a single column like "target ". A classic example would be the categories of a movie, for example, one can be "action, adventure and suspense" at the same time.

Thanks in advance!!

Labels (2)
7 Replies
Highlighted
Community Team
Community Team

Hi @camilo -

Welcome to the community, you've brought your question to the right place!

The community team will try to source an answer for you - stay tuned

Highlighted
Data Scientist
Data Scientist

Hi Camilo, 

 

There is really only one way that I know of to do this in DataRobot currently.  You must concatenate the multiple targets into one column.  So for example instead of having two target columns “action” “thriller” you would have one column “action, thriller”.  

I would also be sure to keep the order consistent across rows so you don’t end up with “thriller, action” as well.  I would approach this by alphabetizing them. 

I hope this helps,

Emily 

Highlighted
Blue LED

Thanks @emily, I assumed something similar, however the result isn't the desired one, since I can't access the probability for each label.

Thank you very much anyway.

Regards

0 Kudos
Highlighted
DataRobot Employee
DataRobot Employee

How many labels do you have; would it be feasible to score your data through multiple binary classifiers?

Highlighted
Blue LED

Hi @doyouevendata, I have also seen in forums that multi-label models can be solved that way (several binary models), but in this case I have about 14 labels in total. For now, I solved it with scikitlearn.

 

Thank you very much

0 Kudos
Highlighted
DataRobot Employee
DataRobot Employee

Ok!  I was going to note that with AutoML, it is pretty easy to create many models and deployments, and you may find one approach works better than another for this case or that label and end up with more accurate models as a result.  This would likely make it easier to track for accuracy as well if you receive actual values as time goes on.  Although at some point the number of labels can make this approach a bit more onerous too.

Highlighted
Blue LED

That's true, thank you very much for taking the time to reply.

Regards