Hi, I am speaking with a quant team and explaining why you don't need neural networks for tabular data. I've said that "conventional" machine learning typically performs as well or better than neural networks, but can anyone point to research papers to support this point?
Here are a few:
This is great. Thanks!
Not done yet...
This ("Another Deceptive NN for Tabular Data — The Wild, Unsubstantiated Claims about Constrained Monotonic Neural Networks") and also a series of medium posts by Bojan Tunguz, I just read these cause I'm not smart enough for actual papers 😅. Also these:
He puts out one of these once a month, basically he beats the neural nets with random forests or untuned GBMs most of the time.
Lol deep learning on tabular data is 🤣. Also Robot 3, not smart enough? You could write any one of them. Point them at Bojan Tunguz on Twitter:
👀
Here he is again (this is the thread that spawned the blog posts above). Basically this guy has made a name for himself disproving basically every paper on neural nets for tabular data.
Internally, our own analytics show that gradient-boosted trees are the best model for 40% of projects, linear models win for 20% of projects, and keras/deep learning models are less than 5% of projects.
Basically, Xgboost is roughly 10x more useful than deep learning for tabular data.
If they're quants, they can be convinced with data!
Robot 1, also we have at least 2 patents for our deep learning on tabular data methods. We spent 2 years building the state of the art here, which includes standard MLPs, our own patented residual architecture for tabular data, and tabular data "deep CTR models" such Neural Factorization machines and AutoInt.
Even with 2 years worth of work and the best data science team I've worked with in my career and we still couldn't get to "5% of projects have a deep learning model as the best model".
Also our Keras models only end up atop our Leaderboard in less than 2% of projects.
You all are the best. Thanks!