cancel
Showing results for 
Search instead for 
Did you mean: 

Data Robot cannot recognize Summarized Categorical from CSV

evgeni
NiCd Battery

Dear community!

 

I'm playing around with summarized categorical this time. When I'm uploading my CSV file using Data Robot UI, it's not recognizing the column I've prepared as Summarized Categorical.

 

According to documentation:

The following is an example of a valid summarized categorical column:

{“Book1”: 100, “Book2”: 13}

 

My file sample looks like this:

 

DaysSinceLastResponse,ResponseContactID,SumCat
86,b159390580d2385a663fb8f9b69de286,"{""test1"": 12, ""test2"": 276}"
358,f0655b4329ba69ea1494f2d71d5f86ab,"{""test1"": 34, ""test2"": 2}"
173,55dd2abf26a8e20059430cf5de0dfd2f,"{""test1"": 192, ""test2"": 0}"
83,0414a41c5978220fd65386ef9039a18b,"{""test1"": 224, ""test2"": 109}"
348,8bcd8d8bf276b943f864ff33a480c790,"{""test1"": 343, ""test2"": 333}"
34,4c08022a53a879254e1a233033807282,"{""test1"": 65, ""test2"": 19}"
316,8733faee82efb74740941aec5f1eccac,"{""test1"": 11, ""test2"": 280}"
215,6364ccddefbb437c63427dfb00a350b7,"{""test1"": 23, ""test2"": 233}"
218,f866a8201785ee1d46811c87e9bd28ea,"{""test1"": 12, ""test2"": 276}"
51,772f0dc5fad949abd9747e299ffad24c,"{""test1"": 98, ""test2"": 777}"
50,538690be409441ee431e54440f052748,"{""test1"": 12, ""test2"": 276}"

 

 

My best guess is that it happens because of quote char escaping, but I have to do it to make this CSV parsable. Also if you look at the raw data in the Data Robot's UI it looks valid JSON.

evgeni_0-1620375435440.pngevgeni_1-1620375496591.png

 

Do you have any ideas about what could be wrong?

 

3 Replies
bdrosen
DataRobot Employee
DataRobot Employee

The issue is this value as summarized categorical only allows positive integers or floats, so zero is invalid

 

"{""test1"": 192, ""test2"": 0}"

 

0 Kudos
evgeni
NiCd Battery

Thanks a lot! I missed this part

0 Kudos
BenjaminJonMiller
Data Scientist
Data Scientist

According to the documentation, a summarized categorical must have a numeric value that is greater than zero.  You have zero values in your csv.  Your format was correct, you just included some invalid rows, which then converts the feature type to a categorical.

 

BenjaminJonMiller_0-1620389115098.png

 Here is the code that I used to clean and test...

 

df[[' 0' not in s for s in df.SumCat]].to_csv('community_q&a_5-7-21_v2.csv', index=False)
0 Kudos