Top Quora Data Science Writers and Their Best Advice, Updated
Get some insight into tips and tricks, the future of the field, career advice, code snippets, and more from the top data science writers on Quora.
This post is based on Most Viewed Writers in Data Science, the 10 writers with the most answer views in the last 30 days, as retrieved on June 29, 2017.
Just so there is no confusion, please note that this post is "authored" by me, but none of the information contained herein -- from the questions to the answers -- has anything to do with me. I simply edited these informative responses together.
The data science topic page at Quora.
Excerpt from answer to: What is a "full-stack" data scientist?
I haven’t heard the expression being used really, but here is my take on what it means:
Data scientists build predictive models. That’s the core of what they do. In addition, they need to know a little bit of:
- Data engineering
- Software engineering
- Business analysis
A full-stack data scientist would be able to seamlessly perform the role of a data engineer, software engineer, business analyst and data scientist. If you needed someone to develop an app, the FSDS could step in and do it. If you needed someone to set up a data warehouse, or to analyze the strategic management processes of a business, the FSDS could do it.
Excerpt from answer to: Is Python still relevant in data science given the rise of Scala (+Spark)?
Scala and Spark aren’t Python rivalries they are friends.
I’ve been saying this for sometime now. Python is and will be the gold standard for machine learning over the next ten years.
The only Python competitor is R and I’ll be honest, in the real world everyone’s using Python. You’ll see a lot of R at the college level but not in the applied space.
Python simply has too much of a head start.
Big Data is simply about getting any data (almost always unstructured data) into a format that can be modeled. Scala and Spark are just tools you can use to do that on very large data sets.
TensorFlow wasn’t written in Scala.
Don’t get caught up in one or two articles even if they are written by Andrew Ng. Do you own research.
Excerpt from answer to: What will data scientists be working on in 5 to 10 years from now?
This brings me to the future. Over the next five years I expect to see lots of companies that are currently claiming to be involved actually trying using it on serious projects. I expect a good chunk of those projects to fail and the whole industry to have generally matured with far more understanding of what works and what doesn’t.
Look at the number of GUI tools that support machine learning now. Things like Excel add-ons that automatically cluster data. Give it five years and I expect most people to think only of them when they think about data science.
In ten years I think fashion will have well and truly moved on. Data Science will be a skill that is common and expected in other disciplines and specialist data scientists will be looked at a little strangely. You will also have a situation where is is common and normal for the data that is captured by systems to be amenable to data science, as opposed to what is happening now where most data is structured in a way that requires significant manipulation.
Excerpt from answer to: Why did you choose to work in data science over quantitative finance?
The summary of all of the reasons I’m about to list was that I chose data science since I was more passionate about it. Here are 5 of the more specific reasons that led to my passion for data science.
- Excitement over a new, emerging, and growing career path - This decision was made sometime 2013 and 2014, when data science was even more new and uncertain than it was today. The idea of entering something where things were still developing and new appealed to me, and still does today. I try not to base my decisions based on hype - so this bullet is more about how the data science field was growing and would have a place for me rather than how it was hot.
- Familiarity towards data science - This is arguably the weakest reason on the list, but by the time that I had to choose what I would work on full-time, I had already had two data-science-related internships under my belt: one at Etsy (company) and one at Quora (company). I had great experiences at both of those internships, so choosing to work in data science full-time was a happy known quantity for me.
- Interest in working on a consumer internet product - I’ve had a longtime fascination with consumer internet products and have basically been excited to watch this whole space grow ever since I got access to dial-up. Working in data science was a unique opportunity for me to become a part of the consumer internet world I’ve been so fascinated by.
- Intrigue of working on a product thats new and upcoming - Consumer internet products were always interesting to me since they live in the land of uncertainty and could potentially become really big (or just fail). The intrigue of working on a product that could potentially become really important and knowing that you had a small role in it was tempting.
- Commitment towards knowledge-sharing - I’ve always been committed to sharing thoughts and ideas, either through being a teaching fellow for Harvard Stat 110 or writing as much as I can on Quora. Tech in general has a culture of meetups and blog posts and Quora answers and panels and invited talks. The same is not true in the secretive world of quantitative finance.
Excerpt from answer to: In Python, how can I save data from a website to CSV using BeautifulSoup?
The lazy way would be to do something like this:
Once you have your data in the dataframe you can do whatever parsing/reformatting you want. Or, if you only need this once you can just do that with Excel or something.
I hope this helps!
Excerpt from answer to: As a data scientist, what tips would you have for a younger version of yourself?
First and foremost, is data science what you think it is?
9 out of 10 aspiring data scientists I come across equate machine learning with data science. “Data Science” is a loaded, catch-all term. Machine learning is a part of it, but at many major tech companies, product analytics is also an integral part of the data science team. Product analytics is a hidden gem. It is fun but doesn’t get talked about nearly as much. This includes:
- A/B test design
- Design metrics: Let’s take a video platform as an example. What is the best metric to optimize for that best represents user satisfaction? Should it be number of videos watched? Time spent watching videos? Percent of users that come back in a week to watch another video?
- Investigate why metrics change: Why is there suddenly a spike in activity in this cohort of users?
- Understand product mechanics: How do button X and feature Y improve the product? Should we redirect page A to B to C or go from A straight to C?
- Identify trends and offer strategic suggestions: Argue with data that the company should invest in ______ area to stay competitive in the field.
Excerpt from answer to: Is AI over-hyped in 2017?
Yes and no, depending upon which community you are talking about.
If you are talking about the academic research community, its not over-hyped. There have been major breakthroughs in AI over the past couple of years, and the celebration is certainly justified.
In my own area of object recognition, we went from ~35% accuracy (mean average precision on Pascal VOC) to above 65% in just 3–4 years. Previously, we were advancing by 1–2% per year, despite object recognition being the hottest area of computer vision with the largest fraction of papers appearing in top conferences every year. Deep learning also made major breakthroughs in reinforcement learning, which is what yielded successes in general Atari game playing, and beat world grand master in Go decades ahead of expectations! It has finally enabled speech recognition to achieve useable levels of accuracy.
Fundamentally, I do not see data science dying off anytime soon. As long as:
- people want to make better decisions (always),
- people care about what the future holds (forever),
- people and companies who do it well benefit (always)
- data points available continue to increase (forever),
- the tools and techniques we have continue to improve (you get the idea)..
...Analytics and data science is not going everywhere.
Disclaimer: extremely biased sample size of one.
Excerpt from answer to: How should a data scientist handle versioning, both for pipeline code and models?
To get the best from version control system it’s better to separate them.
Keeping code in version control system just like any other code is the only logical way, because if you, as a DS, perform some heavy ETL or if your code makes decisions that can bring/cost a lot of money, there’s no way it’s going around code review. No. Way.
For some things that are more typical for data scientists, though, I don’t think that storing Jupyter notebooks in version control is a good practice. You can’t see a decent diff on them, they are not “production code” and in general, when you are finished with something, you want to push at least a “camera-ready” python script. Jupyter notebooks are great for experiments and demonstrations, but outside of these cases there’s always something better.
Excerpt from answer to: What is the essential knowledge and skills are required to start working as data scientist?
Essential knowledge you need to get yourself acquainted with falls under 3 categories i.e Programming, Maths and Science.
As a data scientist you will be expected to take a business problem and translate it to a data question, create predictive models to answer the question and storytell about the findings. Statisticians that focus on implementing statistical approaches to data, and data managers who focus on running data science teams tend to fall in the data scientist role.
Data scientists are the bridge between the programming and implementation of data science, the theory of data science, and the business implications of data.