KDnuggets Home » News » 2016 » Jan » Opinions, Interviews, Reports » 20 Questions to Detect Fake Data Scientists ( 16:n01 )

2016 Gold Blog20 Questions to Detect Fake Data Scientists

Hiring Data Scientists is no easy job, particularly when there are plenty of fake posers. Here is a handy list of questions to help separate the wheat from the chaff.

By Andrew Fogg, Import.io

Check the answers from KDnuggets Editors to these questions (and one more):
fake-unicorn21 Must-Know Data Science Interview Questions and Answers

Now that the Data Scientist is officially the sexiest job of the 21st century, everyone wants a piece of the pie.

That means there are a few data posers out there. People who call themselves Data Scientists, but who don't actually have the right skill set.

This isn't always done out of a desire to deceive. The newness of data science and lack of a widely understood job description means that many people may think they are data scientists purely because they deal with data.

kirk“Fake data scientists are often experts in one particular discipline and insist that their discipline is the one and only true data science. That belief misses the point that data science refers to the application of the full arsenal of scientific tools and techniques (mathematical, computational, visual, analytic, statistical, experimental, problem definition, model-building and validation, etc.) to derive discoveries, insights, and value from data collections.”
- Kirk Borne, Principal Data Scientist at Booz Allen Hamilton and founder of RocketDataScience.org

The first way to detect fake Data Scientists is to understand the skill set you should be looking for. Knowing the difference between what makes a Data Scientists vs a Data Analyst vs a Data Engineer is important, especially if you're planning on hiring one of these rare specimens.

To help you sort the true data scientist from the fake (or misguided) one, we've complied a list of 20 interview questions you can ask when interviewing data scientists.

  1. Explain what regularization is and why it is useful.
  2. Which data scientists do you admire most? which startups?
  3. How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression.
  4. Explain what precision and recall are. How do they relate to the ROC curve?
  5. How can you prove that one improvement you've brought to an algorithm is really an improvement over not doing anything?
  6. What is root cause analysis?
  7. Are you familiar with pricing optimization, price elasticity, inventory management, competitive intelligence? Give examples.
  8. What is statistical power?
  9. Explain what resampling methods are and why they are useful. Also explain their limitations.
  10. Is it better to have too many false positives, or too many false negatives? Explain.
  11. What is selection bias, why is it important and how can you avoid it?
  12. Give an example of how you would use experimental design to answer a question about user behavior.
  13. What is the difference between "long" and "wide" format data?
  14. What method do you use to determine whether the statistics published in an article (e.g. newspaper) are either wrong or presented to support the author's point of view, rather than correct, comprehensive factual information on a specific subject?
  15. Explain Edward Tufte's concept of "chart junk."
  16. How would you screen for outliers and what should you do if you find one?
  17. How would you use either the extreme value theory, Monte Carlo simulations or mathematical statistics (or anything else) to correctly estimate the chance of a very rare event?
  18. What is a recommendation engine? How does it work?
  19. Explain what a false positive and a false negative are. Why is it important to differentiate these from each other?
  20. Which tools do you use for visualization? What do you think of Tableau? R? SAS? (for graphs). How to efficiently represent 5 dimension in a chart (or in a video)?

“A “real” data scientist knows how to apply mathematics, statistics, how to build and validate models using proper experimental designs. Having IT skills without statistics skills makes you a data scientist as much as it makes you a surgeon to know how to build a scalpel.”
~ Lisa Winter, Senior Analyst at Towers Watson

How do you quantify a real data scientist?

Author Bio:

andrew-foggAndrew Fogg
is Founder & CDO at import.io. He brings to import.io his passion for and belief in the future of the structured web. He believes strongly in helping data users and data providers transact more efficiently. An expert on data and the Structured Web.Prior to co-founding import.io, Andrew worked with data for Microsoft Research, Barclays Capital, Cambridge University and the Wellcome Trust before joining RBS as part of the Technology Innovation group. He recently sold his first startup, Kusiri, to PwC.

Original Post