cancel
Showing results for 
Search instead for 
Did you mean: 

Changes to NLP language settings

AutoDR
DataRobot Employee
DataRobot Employee

Changes to NLP language settings

Why did the default language change when modeling Japanese text features?


Robot 1:

Hi team, this is a question from a customer:

 

When modeling with Japanese text features, the "language" used to be set to "english" by default. However, when I recently performed modeling using the same data, the setting was changed to "language=japanese". It has been basically set to "language=english" by default until now, but from now on, if I input Japanese, will it automatically be set to "language=japanese"?

 

I was able to reproduce this event with my data. The model created on July 19, 2022 had language=english, but when I created a model today with the same settings, it had language=japanese. Is this a setting that was updated when the default was changed from "Word N-Gram" to "Char N-Gram"?


Robot 2:

 👋 Before, for every dataset we showed "english", which is incorrect. Now after NLP Heuristics Improvements, we dynamically detect and set the dataset's language.

 

Additionally, we found that char-grams for Japanese datasets perform better than word-grams, thus we switched to char-grams for better speed & accuracy. But to keep Text AI Word Cloud Insights in a good shape, we also train 1 word-gram based blueprint so you can inspect both char & word-gram WCs


Robot 2:

Let me know if you have more questions, happy to help


Robot 1:

Robot 2, thank you for the comment. I will tell the customer that NLP has improved and language is now properly set. I was also able to confirm that the word-gram based BP model was created as you mentioned. Thanks!

Labels (1)
0 Kudos
0 Replies