From Science to Data Science, a Comprehensive Guide for Transition

An in-depth, multifaceted, and all-around very helpful roadmap for making the switch from 'science' to 'data science,' yet generally useful for data science beginners or anyone looking to get into data science.



Practicing and building a showcase


Some people recommend Kaggle as a starting point but I would take it with a grain of salt. Don’t get me wrong - there are great resources, it provides feedback (otherwise it is hard to tell if your solution is good) and some people find it really engaging. But if you start with a goal of winning - you will end up disappointed, with neither fame nor gold (prized competitions are not beginner-level). Moreover, beware that industrial problems rarely look like that (e.g. in all mine data cleaning was a big thing, and in none 5% score improvement mattered). More on that:

Personally, I enjoy the most working on data I care about and find genuinely interesting. It drives my motivation much more than any competition could. Also, this way it is a complete data science - from asking questions and getting data to presenting the results in a meaningful form.

Making results public, including code, is a great room for both feedback and building a showcase. It can be an IPython Notebook, or a website, or even a just a plot (but then be sure to sign it - it it goes viral you want to get due recognition!). E.g. some mine (see also Projects):

So, once again, be sure to get a GitHub account (for hosting code, notebooks and websites). Mine looks like that: github.com/stared. And don’t be afraid to put premature code: if it is not good yet then no-one will notice (or care) anyway. Also, some people like writing about problems they have just learnt (e.g.How gzip uses Huffman coding - Julia Evans). If it is your thing - just do it (see my post on Jekyll)!

Data science components

Data science boot camps


It’s totally fine to learn things on your own. But doing on a boot camp may be a huge boosts - motivational, with access to tutors/experts, with job opportunities. Here are some camps I am aware of:

Internships


If you are still a student - doing an internship may be a great way to get a lot of experience, feedback, confidence and contacts. I did mine during my PhD studies (in Europe it is not common to take a break, and a lot of people in academia dissuaded me, but I consider it a wonderful, life-changing experience)4.

To search for offers try googling data science/scientist intern/internship and visit some job listings (e.g. Indeed). Sometimes it makes sense to mail a company even if they don’t use wordsintern or internship - especially smaller ones may be flexible. Some bigger tech companies (Facebook, Google, IBM, Microsoft) offer internships5, see:

Aim at tech companies (to actually work in data science). In the [San Francisco] Bay Area (i.e. north of Silicon Valley) there are plenty opportunities to learn data science - it should be your primary destination. To work in US you need to get J-1 visa (of course, after they want you), but it’s relatively easy (but takes ~2-3 months).

Once on-site, start look for various meeting and hackathons, especially via Meetups. Search for anything that may fit (data science, R communities, big data etc) and try to visit a lot of events. In the Bay Area it is an advantage to be “bold”. So don’t be afraid to asking about or for anything, starting talking to people etc - on the average it will be much better than taking a passive posture. See also:

Feed


Never stop learning. Some feeds:

And if you have a question, a good place to ask (and search for answers) is:

Advanced data science

Advanced stuff


Since you are in maths, it may be possible for you to make a shortcut and get into advanced topics. Here is a random list of starting points I consider interesting:

About


This blog post started as emails, and went through a stage of an extract of emails (shared on Google Docs). It took me way more time than I expected to present it in the current form.

There are many people who helped me with this post, at its various stages (starting from asking me questions!). But I would like to especially thank to: Adam Goliński, Sebastian Jaszczur, Kasia Kulma andRobert Bogucki for their remarks on the final version.

I would love to hear your feedback! Did you find it useful? Or maybe you would recommend another learning strategy? Or additional links?

Or maybe your company needs a data science training? I would be happy to provide it! Seeworkshops.deepsense.io for the menu (and we are happy to make custom workshops) and fill the form or contact me directly!

  1. For instance, if you don’t have a quantitative background, you need to focus on it (and it may be the hardest part). Since it was not my path, I can’t help.
  2. But if you come from a non-academic background (e.g. web dev), then from your perspective data science is science. Or to make it precise - it is engineering, but more like designing new engines, than building a house.
  3. Great thanks to Adam Zadrożny for showing me this possibility (he interned at Facebook while doing his PhD in gravity waves) and to Jacek Migdał for convincing me to apply to the Bay Area, rather than somewhere else.
  4. If you have background in computer science, it will be like playing on the easy level (it was not my case, though). It may be possible to apply as a software engineer expressing interest in data - and learn from that point.
  5. Hacker News is my best general-purpose non-personal feed, complemented by The Economist.
  6. In particular, hacking p-value is wrong. But you should be aware what is p-value and why it can be hacked (accidentally or purposefully).

Bio: Piotr Migdał is a data science freelancer, with PhD in quantum physics; based in Warsaw, Poland. Active in gifted education, developing a quantum game and working as a data science instructor at deepsense.io.

Original. Reposted with permission.

Related: