Silver BlogMy journey path from a Software Engineer to BI Specialist to a Data Scientist

The career path of the Data Scientist remains a hot target for many with its continuing high demand. Becoming one requires developing a broad set of skills including statistics, programming, and even business acumen. Learn more about one person's experience making this journey, and discover the many resources available to help you find your way into a world of data science.



By Pramod R.

I recently gave a talk on “Introduction to Neural Network and Deep Learning”, and many participants asked me one particular question after the talk — This introduction is good, but what next? How do I transition from my current role (which happened to be mostly development/engineering) to a Data Scientist Role? This is very good and a relevant question, which I have asked myself many times after attending such introductory talks. Hence, I decided to write about my transition path that seemed to work for me, and I hope that it may benefit many others who are in the process of transitioning.

The first step is to psych yourself out that - no matter what, you will end up being a full-fledged Data Scientist! Not joking! I’ve personally faced this issue of getting into the tendencies of dropping out of a course/project/hackathon when things get tougher/murkier to comprehend owing to the complexities or the challenges in the subject. But be aware, that a Data Scientist role requires a unique combination skillset of math/stats, programming, data, algorithms and most importantly — business solutioning! Also this is an ever-changing field and the techniques that were invented a few months ago gets beaten up within no time by far superior techniques (as claimed by the authors). So, in short this requires consistent dedication and motivation to keep learning.

But I’m not writing this to scare you guys away from embarking this journey. I’m writing this to show you some source of motivation that can help you keep going-

  1. The problems that are being solved have a huge impact and changes the game of business (Example: Self-driving cars, Facebook, Google, Uber, Amazon)
  2. The constantly evolving nature itself is interesting and exciting. Example: I’ve been closely monitoring the field of NLP since 2012, especially in the space featurization of the text. First, there was One-hot encoding, then came word2vecdoc2vecfasttext, and other series of embeddings. On a parallel line, owing to the advent of deep learning popularity, came the series of RNNCNNLSTMGRUSeq2Seq Autoencoders. Now, using the power of both, came the transfer learning like ELMOInferSent. The recent dawn of transformers has just upped the ante with crazy multitask learnings like BERTOpen AI GPT, etc.
  3. The hackathons and the community activities are so engaging that you feel the urge to keep yourself abreast of all the happenings
  4. Last but not least, Data Science is undoubtedly the sexiest job of the century, with promising rewards

Enough of generic gyan/high-level talk, now let's get into the business. Below, I’ve listed the best,free-of-cost,super-easy-to-understand list of courses that I’ve taken up, which have helped in strengthening my foundations (with links )—

Linear Algebra:

  • Khan Academy: This is a classic school-like tutorial. It helps to refresh the math concepts and also has a few exercises.
  • 3Blue1Brown: Beautifully visualized — easy to comprehend. The concept explained will be etched in your memory forever.
  • MIT course by Prof. Gilbert Strang: The Prof.’s articulation is so wonderful (Obviously, he teaches at MIT! ????).

Probability and Statistics:

  • Statistics 110: Again, a great explanation, with good examples and also some problem sets to solve for.

Machine Learning/Deep Learning:

  • CS231N: It is amazing the way Stanford has published their classroom sessions online. I think Andrej Karpathy used to teach this during 2016, but the new material of 2017 is great too!
  • CS224D: This is the NLP equivalent of the above one, done by Christoper Manning and Richard Socher — two best lecturers I’ve come across.
  • ML by Nando de Freitas: His explanation of the loss function and the gradient descent is unbeatable.
  • Google ML: These crash courses by various topics are pretty handy and practical.
  • ai: These folks are gaining a lot of popularity in the recent past, with their easy-to-use practical approach to teach, along with various notebooks being made available.
  • Kaggle Learn: More recently opened, yet to add more content. What impressed me is — with such little theory, they are able to help us learn so much with practical exposure.
  • ML by Andrew NG: It is customary to mention the most popular one!! (though this is not free).

Reinforcement Learning:

Python Programming:

  • Analytics Vidhya course: This is a practical-data-science oriented course.
  • Codeacademy: Updated course available for python3 (I had taken it for python2.7).
  • Pytorch: If you want to get started on pytorch specific projects.
  • Tensorflow: If you want to start using tensorflow.

Coding Environment:

  • Anaconda: They provide distributions for Windows, Mac, and Linux and does not require admin access to install on your machine.
  • Google Colab: If you do not have a powerful machine, you can still go ahead and execute out of colab notebooks. They provide GPU and TPUs too, free of cost.
  • Kaggle Kernals: Alongside providing such wonderful challenges and datasets to work with, they also started providing kernals. Although, I guess they have a limited set of resources, as I’ve been unsuccessful in spawning a machine sometimes.
  • AWS free tier: For some limited time, AWS also provides free tier applications like SageMaker, which lets you build and deploy notebooks, off the shelf.
  • CoCalc: I haven’t used it, but have heard good reviews on this, so adding this too.

Blogs to follow:

  • Analytics Vidhya: Undoubtedly, undisputedly the biggest, most popular and the best blogs today.
  • KDNuggets: I’ve been following this since 2011. Rich and diverse content. Also, check out their free open datasets.
  • DataScienceCentral: Also a good repository for DS.
  • TowardsDataScience: A medium blog channel. 100s of ideas exchanged every week.
  • Karpathy blog: By Andrej Karpathy. The unreasonably effective blog! Period.
  • Colah’s blog: Especially look for the LSTM explanation.
  • Jay Alammar blog: Illustration of everything, in such a visually appealing manner! Especially the transformer!
  • Machine Learning Mastery: Thank you Dr. Jason Browniee, about 80% of my deep learning codes have been inspired by your blogs
  • Yhat blog: A variety of data/ML related topics
  • Sebastian Raschaka Blog: I love his project illustrations and thought process in solving them
  • Fastml:No one can decipher and simplify the complex Machine Learning techniques better than fastml. It is so easy to understand
  • No free hunch: Blog by Kaggle, contains interviews, discussions and trends in machine learning
  • Domino Data Blog: Provides a good perspective of the Big Data Science applications from a large company perspective

Podcasts:

Github to follow:

Most of the awesome-<everything>. These are curated list of videos, codes, podcasts, courses, blogs, everything under the sun, in relation to that topic. Some examples include:

Twitter handles

Other noteworthy mentions

Lastly, a few more that needs to be mentioned, that has caught my attention in the recent past are:

I guess that is about it! Hope this helps you guys get started somewhere (whichever channel inspires you the best). Keep learning and keep contributing! May the force be with you!

 

Original. Reposted with permission.

Bio: Pramod is an AI solutioning expert, blogger and an avid speaker with 14 years of experience in building Data Science solutions in large organizations like Target and Fidelity using Reinforcement Learning, Recommendation Systems, NLP, Deep Learning, and massive data parallel computing.

Related:


No, thanks!