Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Forgotten Job of a Data Scientist: Editing (john-foreman.com)
22 points by rouli on May 11, 2014 | hide | past | favorite | 7 comments


Data Scientist (noun): A statistician who lives in San Francisco.

(only half joking)


I feel like "data scientist" is a title that grew out of the Fundamental Theorem of Employment, which states that you're usually hired to do a job that either (1) the boss man can't do for himself, or (2) the boss doesn't want to do. Type 1 work gets you respect and autonomy. Type 2 work will have you commoditized.

Software companies are satisfied with the job they've done at commoditizing programming talent but, at least for now, having a half-decent grasp of any specialty (e.g. machine learning, information retrieval) requiring mathematical firepower puts one solidly into Type-1 employment, which is where one wants to be.

"Data scientist" seems to be a way of saying, "yes, I code but I also know math, so use me for Type-1 work only".


> "Data scientist" seems to be a way of saying, "yes, I code but I also know math, so use me for Type-1 work only".

You make it sound like a bad thing? Despite the rah-rah I hear from programmers about how they are unique snowflakes, being only a programmer is like being a janitor. A prime way to get discarded at the age of 40. If I can make sure that I am valuable because I bring other things to the table (Math, Product vision, people skills), why on earth wouldn't I rebrand myself to better reflect that?


You make it sound like a bad thing?

Not at all. That's my attitude as well. I don't want to waste my life on Type-2 work.

"I only want to do interesting work" sounds entitled after being conditioned by corporate mediocrity, but I think it's a reasonable attitude. Companies frown on self-assertion, preferring agreeable mediocrity, and I hate that. I tend to be honest about things.

You can't say, "I leave bosses and companies that assign me crappy work" on a job interview. I wish people could be honest about such things, but it's just not socially acceptable to speak the truth about anything that matters (e.g. politics, religion, sex, money, power, careers). On HN, I try to be as honest as I can be. Sorry if it comes off as obnoxious.

Despite the rah-rah I hear from programmers about how they are unique snowflakes, being only a programmer is like being a janitor. A prime way to get discarded at the age of 40.

Agree.

If I can make sure that I am valuable because I bring other things to the table (Math, Product vision, people skills), why on earth wouldn't I rebrand myself to better reflect that?

That's absolutely what you should be doing. If it's not obvious, I'm on the same side with people who say "I know math, so use me for Type-1 work only". I am one of them.

The reason the job distinction is toxic in many companies, however, is that software engineering should also be respected rather than commoditized. To me, the rush of people like you and me to get "superior" titles on our resumes is a sign that the business world doesn't respect "regular old" software engineering. That sucks, because the skills of a truly good software engineer are also quite important.


> software engineering should also be respected rather than commoditized.

I am not so sure anymore. When anyone who has done six weeks in a boot camp can call themselves a software engineer, the semantics of the word are lost.

> That sucks, because the skills of a truly good software engineer are also quite important.

The best people? They are valued and known to be valuable. For example, there is a guy in my company who works remote from the midwest. He is truly amazing. When he interviewed me, I quickly got the feeling that his machine learning skills were top notch. BUT, when I work with him, I realize that he is truly phenomenal. He can write code up and down the abstraction ladder. Good, solid fucking code. Hell, he can double up as an SRE and fix shit when he wants to. Sure, he is a "machine learning engineer". But he is much much more than just that title.


"Data scientist" is a mess of a job title. It seems to be as much of a reaction against the commoditization of software engineering (which leaves the smartest, and by correlation, usually the most mathematically literate, 10% of programmers ill-suited for the average software job) as it is a real distinction.

There are plenty of "data scientists" who use canned tools and play around with parameters because that's all "the business" thinks it needs.

You want to trim complexity for a reason that any data scientist worth his salt (and there are plenty of celebrity engineers in SF making $500k who aren't worth their salt and don't know this) should already know: bias-variance tradeoff (see also: underfitting and overfitting). If your model is too flexible/complex, it will begin absorbing noise. That leads to a model that performs extremely well on training data but fails miserably on unseen data. There are well-studied techniques for preventing this, but I'd guess that fewer than 20% of self-described or titled "data scientists" are familiar with them.


> There are plenty of "data scientists" who use canned tools and play around with parameters because that's all "the business" thinks it needs.

As with a software engineer, it is a role that is different in every place. Every place has its own definition of the role. This is not bad. It is a mere reflection of the market conditions where there are a lot of people are simultaneously bad at Linear Algebra, Probability and Statistics and dangerous enough to write production code fast. (Your standard C.S. grad SWE).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: