Dear community!
I'm playing around with summarized categorical this time. When I'm uploading my CSV file using Data Robot UI, it's not recognizing the column I've prepared as Summarized Categorical.
According to documentation:
The following is an example of a valid summarized categorical column:
{“Book1”: 100, “Book2”: 13}
My file sample looks like this:
DaysSinceLastResponse,ResponseContactID,SumCat
86,b159390580d2385a663fb8f9b69de286,"{""test1"": 12, ""test2"": 276}"
358,f0655b4329ba69ea1494f2d71d5f86ab,"{""test1"": 34, ""test2"": 2}"
173,55dd2abf26a8e20059430cf5de0dfd2f,"{""test1"": 192, ""test2"": 0}"
83,0414a41c5978220fd65386ef9039a18b,"{""test1"": 224, ""test2"": 109}"
348,8bcd8d8bf276b943f864ff33a480c790,"{""test1"": 343, ""test2"": 333}"
34,4c08022a53a879254e1a233033807282,"{""test1"": 65, ""test2"": 19}"
316,8733faee82efb74740941aec5f1eccac,"{""test1"": 11, ""test2"": 280}"
215,6364ccddefbb437c63427dfb00a350b7,"{""test1"": 23, ""test2"": 233}"
218,f866a8201785ee1d46811c87e9bd28ea,"{""test1"": 12, ""test2"": 276}"
51,772f0dc5fad949abd9747e299ffad24c,"{""test1"": 98, ""test2"": 777}"
50,538690be409441ee431e54440f052748,"{""test1"": 12, ""test2"": 276}"
My best guess is that it happens because of quote char escaping, but I have to do it to make this CSV parsable. Also if you look at the raw data in the Data Robot's UI it looks valid JSON.
Do you have any ideas about what could be wrong?
Solved! Go to Solution.
The issue is this value as summarized categorical only allows positive integers or floats, so zero is invalid
"{""test1"": 192, ""test2"": 0}"
The issue is this value as summarized categorical only allows positive integers or floats, so zero is invalid
"{""test1"": 192, ""test2"": 0}"
Thanks a lot! I missed this part
According to the documentation, a summarized categorical must have a numeric value that is greater than zero. You have zero values in your csv. Your format was correct, you just included some invalid rows, which then converts the feature type to a categorical.
Here is the code that I used to clean and test...
df[[' 0' not in s for s in df.SumCat]].to_csv('community_q&a_5-7-21_v2.csv', index=False)