Well today was a depressing day for data science at work

Blue LED

I work as an Analytics Manager and normally I don't really share with my colleagues the in depth details of my work because nobody is technical. They just know I use SQL and Python to spit out a report and that's all they really understand.

Recently I had an opportunity to do something interesting where I had a very skewed data set and was asked to make sense of it. I thought applying a log transformation to normalize the data as the distribution was extremely skewed and thought it would make the data easier to interpret and I was happy with the results.

My colleagues wanted to know how I arrived at my conclusion for my recommendation so I walked them through the process. I provided a high level overview, but they really wanted to get into the nitty gritty of exactly HOW I got to where I got. Again, I normally don't like to do this because they tend ask very detailed questions like "Can you explain exactly how this works so I can understand?" and it gets tough to explain certain things (e.g., I would say something like "I got this raw data from the API" but then it's followed up by a question like "What is an API and why didn't you just use the vendor's dashboard? Why doesn't the vendor's interface provide the data? Why do you have to go through this step?").

Well anyway I was explaining my logic and really tip toed around the stats by really trying to say essentially that I applied some logic that would more even distribute the data. I stupidly said the word "log" and it was over. Their faces froze for a good 10 seconds before someone spoke.

"I'm sorry..but what? What is log? Why do you need this log? I'm really confused. How am I suppose to explain this to the SVP? Can you just change it so you're indexing off the median or mean? This doesn't make any sense."

I'm dead.

So guys, how do you avoid getting away with having to explain things by not using technical jargon? I mean like I said, I normally just show people the output of my report and I don't get too many in depth questions about it, but when people really pry, it gets tough. I don't know how to avoid saying "I ran a regression model" because it freaks people out and they get upset because they don't understand it and want me to do something that is easier to interpret. It's already frustrating that they think I'm over complicating things by using Python instead of Excel, but I can at least tell them certain things like Excel can't handle 1+ million rows and they can understand that.

EDIT: Hey all, appreciate all you feedback, both positive and negative. I think it's all constructive. There's a lot of replies in this thread so it's hard to answer everyone's questions. Some of you ask the same things and I do answer them, but you might have to dig through the comments to find it. In any case, let me post some frequently asked ones:

Did you use graphs and visuals?

Answer: I did. I actually stayed away from using the word log at first and just showed them a histogram of what the data looked like before the log transform and then after the log transform. What followed up was "How did you do this? What did you use to index the numbers?" And that's when I mentioned I used a log transformation.

2) How did you explain what a log transform was?

Answer: I didn't directly explain it in math terms. I described it in the form of an analogy or comparable example. This is what I said:

"Imagine we were looking at salaries for a company. Most people for example will make a salary between something like $40 - 100k right? Well what if the top execs in the company make like $10 million dollars. Now imagine you tried to index using the mean? Wouldn't the mean say the average employee at this company makes a salary of $1 million dollars or something? That's not right. Basically, a log transformation would keep everyone's differences in consideration, but minimize the importance of some of those execs who are not entirely representative of the rest of the company. That way you can better see the difference in salary among every worker and not have your vision obscured by some of these execs who are throwing off that balance. "

Unfortunately, this proved to be even more confusing and my colleagues really zoned in on exactly what a log was and how it's calculated and it was tough to navigate from there.

3) What is SVP?

Answer: It stands for Senior Vice President. It basically is a reference to our execs.


2 Replies
NiCd Battery

If anyone's wondering where this text is from it's taken from a Reddit post with multiple comments which explains why the poster looks like he's talking to himself


It's actually an interesting discussion with some useful tips. Really hammers home that one of the most important skills for a data scientist is communication.


NiCd Battery

Personally I explain things at a very basic but unpatronising way and gradually get more and more detailed till they stop, however there's a time and a place and I will provide sessions for anyone that is interested.

I would also say, perhaps dont tell them about the log in the first place. They probably don't need to know, and are asking questions to see why it's significant to what they need to do. Same with the API's.

I know if someone was telling me they "flux capacitated" the data i would think "oh, this is relevant why dont i  know what it is" and then worry that I need to understand it.

Another point, especially if you are new to a job in a pressured environment and perhaps haven't built up trust is to explain to someone else, who is trusted the detail. This will also help you in that you will have someone to bounce your ideas off as well as gain credibility. They will know the culture and senior staff better and help you prioritise the level of detail to include in your reports.

Tricky one!