multi-label classification

multi-label classification

How can I solve a multi-label classification problem (not multi-class) in datarobot?

In scikitlearn for example you can pass a vector with all the labels, but I don't know how to pass that to the datarobot, which expects a single column like "target ". A classic example would be the categories of a movie, for example, one can be "action, adventure and suspense" at the same time.

Thanks in advance!!

Labels (1)
1 Solution

Accepted Solutions

Ok!  I was going to note that with AutoML, it is pretty easy to create many models and deployments, and you may find one approach works better than another for this case or that label and end up with more accurate models as a result.  This would likely make it easier to track for accuracy as well if you receive actual values as time goes on.  Although at some point the number of labels can make this approach a bit more onerous too.

View solution in original post

8 Replies

Hi @camilo -

Welcome to the community, you've brought your question to the right place!

The community team will try to source an answer for you - stay tuned 🙂

emily
DataRobot Alumni

Hi Camilo, 

 

There is really only one way that I know of to do this in DataRobot currently.  You must concatenate the multiple targets into one column.  So for example instead of having two target columns “action” “thriller” you would have one column “action, thriller”.  

I would also be sure to keep the order consistent across rows so you don’t end up with “thriller, action” as well.  I would approach this by alphabetizing them. 

I hope this helps,

Emily 

Thanks @emily, I assumed something similar, however the result isn't the desired one, since I can't access the probability for each label.

Thank you very much anyway.

Regards

0 Kudos

How many labels do you have; would it be feasible to score your data through multiple binary classifiers?

Hi @doyouevendata, I have also seen in forums that multi-label models can be solved that way (several binary models), but in this case I have about 14 labels in total. For now, I solved it with scikitlearn.

 

Thank you very much

0 Kudos

Ok!  I was going to note that with AutoML, it is pretty easy to create many models and deployments, and you may find one approach works better than another for this case or that label and end up with more accurate models as a result.  This would likely make it easier to track for accuracy as well if you receive actual values as time goes on.  Although at some point the number of labels can make this approach a bit more onerous too.

That's true, thank you very much for taking the time to reply.

Regards

With the 7.0 release we have added multilabel classification, see the details here: https://community.datarobot.com/t5/blog/what-s-new-in-datarobot-release-7-0/ba-p/10939