Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

duncanrenfrow

Data Scientist

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Printer Friendly Page
- Email to a Friend
- Report Inappropriate Content

04-14-2020
01:55 PM

As a data scientist here at DataRobot, I've had several customers ask me about splines in our regression models. The DataRobot product documentation provides a concise and accurate definition: constant splines converts numeric features into piece-wise constant spline base expansion. A derived feature will equal to 1.0 if the original value x is within the interval: a < x <= b.

However,I always find I learn best from examples, so here is an example I came up with:

Let's say we're looking at home prices and distance to a popular, beautiful beach. I suspect what we'd see is a gradually increasing slope where the closer you are to the beach the higher the price of the home. However, at some point when you get close enough (e.g. easy walking distance) you would see home price increase sharply vs. distance. A typical linear regression model would struggle to accurately fit that relationship, because the slope is changing dramatically at the "walking distance" point. A spline works by creating bins that each can have their own "slope". In the simplest case, this would enable the linear model to have 2 separate slopes: one for the homes greater than the "walking distance" value and one for the homes less than walking distance. That way, we could model the fact that moving one block closer when you're within 5 blocks of the beach increases the home price dramatically, but moving 1 block closer when you're 2 miles away has less effect.

At the risk of complicating the example, in actuality splines can actually represent separate polynomial functions for each "piece" or "bin", in this case of distance to the beach. However, the nice thing about using splines is that you avoid the need to generate complex polynomials that might overfit the data and introduce illogical values. Instead each range of feature values gets their own, simpler polynomial function.

The blog here goes into more depth with some great visuals: https://www.analyticsvidhya.com/blog/2018/03/introduction-regression-splines-python-codes/

Regards,

Duncan

2 Replies

emily

Data Scientist

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Printer Friendly Page
- Email to a Friend
- Report Inappropriate Content

04-15-2020
09:30 PM

This is a wonderful explanation Duncan. Thank you for sharing it!

duncanrenfrow

Data Scientist

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Printer Friendly Page
- Email to a Friend
- Report Inappropriate Content

04-15-2020
09:35 PM

Thanks Emily!