My journey path from a Software Engineer to BI Specialist to a Data Scientist
The career path of the Data Scientist remains a hot target for many with its continuing high demand. Becoming one requires developing a broad set of skills including statistics, programming, and even business acumen. Learn more about one person's experience making this journey, and discover the many resources available to help you find your way into a world of data science.
By Pramod R.
I recently gave a talk on “Introduction to Neural Network and Deep Learning”, and many participants asked me one particular question after the talk — This introduction is good, but what next? How do I transition from my current role (which happened to be mostly development/engineering) to a Data Scientist Role? This is very good and a relevant question, which I have asked myself many times after attending such introductory talks. Hence, I decided to write about my transition path that seemed to work for me, and I hope that it may benefit many others who are in the process of transitioning.
The first step is to psych yourself out that - no matter what, you will end up being a full-fledged Data Scientist! Not joking! I’ve personally faced this issue of getting into the tendencies of dropping out of a course/project/hackathon when things get tougher/murkier to comprehend owing to the complexities or the challenges in the subject. But be aware, that a Data Scientist role requires a unique combination skillset of math/stats, programming, data, algorithms and most importantly — business solutioning! Also this is an ever-changing field and the techniques that were invented a few months ago gets beaten up within no time by far superior techniques (as claimed by the authors). So, in short this requires consistent dedication and motivation to keep learning.
But I’m not writing this to scare you guys away from embarking this journey. I’m writing this to show you some source of motivation that can help you keep going-
- The problems that are being solved have a huge impact and changes the game of business (Example: Self-driving cars, Facebook, Google, Uber, Amazon)
- The constantly evolving nature itself is interesting and exciting. Example: I’ve been closely monitoring the field of NLP since 2012, especially in the space featurization of the text. First, there was One-hot encoding, then came word2vec, doc2vec, fasttext, and other series of embeddings. On a parallel line, owing to the advent of deep learning popularity, came the series of RNN, CNN, LSTM, GRU, Seq2Seq Autoencoders. Now, using the power of both, came the transfer learning like ELMO, InferSent. The recent dawn of transformers has just upped the ante with crazy multitask learnings like BERT, Open AI GPT, etc.
- The hackathons and the community activities are so engaging that you feel the urge to keep yourself abreast of all the happenings
- Last but not least, Data Science is undoubtedly the sexiest job of the century, with promising rewards
Enough of generic gyan/high-level talk, now let's get into the business. Below, I’ve listed the best,free-of-cost,super-easy-to-understand list of courses that I’ve taken up, which have helped in strengthening my foundations (with links )—
- Khan Academy: This is a classic school-like tutorial. It helps to refresh the math concepts and also has a few exercises.
- 3Blue1Brown: Beautifully visualized — easy to comprehend. The concept explained will be etched in your memory forever.
- MIT course by Prof. Gilbert Strang: The Prof.’s articulation is so wonderful (Obviously, he teaches at MIT! 😀).
Probability and Statistics:
- Statistics 110: Again, a great explanation, with good examples and also some problem sets to solve for.
Machine Learning/Deep Learning:
- CS231N: It is amazing the way Stanford has published their classroom sessions online. I think Andrej Karpathy used to teach this during 2016, but the new material of 2017 is great too!
- CS224D: This is the NLP equivalent of the above one, done by Christoper Manning and Richard Socher — two best lecturers I’ve come across.
- ML by Nando de Freitas: His explanation of the loss function and the gradient descent is unbeatable.
- Google ML: These crash courses by various topics are pretty handy and practical.
- ai: These folks are gaining a lot of popularity in the recent past, with their easy-to-use practical approach to teach, along with various notebooks being made available.
- Kaggle Learn: More recently opened, yet to add more content. What impressed me is — with such little theory, they are able to help us learn so much with practical exposure.
- ML by Andrew NG: It is customary to mention the most popular one!! (though this is not free).
- David Silver Lectures: This is the classic 101 of reinforcement learning, right out of the Sutton & Barto textbook (although I think he references the examples from the first edition of this book).
- IIT Madras — Prof. Ravindran Balaraman: Another wonderful complete picture of RL. I think the Prof. has worked with Andrew Barto, who is one of the author of the Reinforcement Learning Bible (mentioned earlier).
- Deep RL bootcamp: Lots fresh ideas and discussions.
- Analytics Vidhya course: This is a practical-data-science oriented course.
- Codeacademy: Updated course available for python3 (I had taken it for python2.7).
- Pytorch: If you want to get started on pytorch specific projects.
- Tensorflow: If you want to start using tensorflow.
- Anaconda: They provide distributions for Windows, Mac, and Linux and does not require admin access to install on your machine.
- Google Colab: If you do not have a powerful machine, you can still go ahead and execute out of colab notebooks. They provide GPU and TPUs too, free of cost.
- Kaggle Kernals: Alongside providing such wonderful challenges and datasets to work with, they also started providing kernals. Although, I guess they have a limited set of resources, as I’ve been unsuccessful in spawning a machine sometimes.
- AWS free tier: For some limited time, AWS also provides free tier applications like SageMaker, which lets you build and deploy notebooks, off the shelf.
- CoCalc: I haven’t used it, but have heard good reviews on this, so adding this too.
Blogs to follow:
- Analytics Vidhya: Undoubtedly, undisputedly the biggest, most popular and the best blogs today.
- KDNuggets: I’ve been following this since 2011. Rich and diverse content. Also, check out their free open datasets.
- DataScienceCentral: Also a good repository for DS.
- TowardsDataScience: A medium blog channel. 100s of ideas exchanged every week.
- Karpathy blog: By Andrej Karpathy. The unreasonably effective blog! Period.
- Colah’s blog: Especially look for the LSTM explanation.
- Jay Alammar blog: Illustration of everything, in such a visually appealing manner! Especially the transformer!
- Machine Learning Mastery: Thank you Dr. Jason Browniee, about 80% of my deep learning codes have been inspired by your blogs
- Yhat blog: A variety of data/ML related topics
- Sebastian Raschaka Blog: I love his project illustrations and thought process in solving them
- Fastml:No one can decipher and simplify the complex Machine Learning techniques better than fastml. It is so easy to understand
- No free hunch: Blog by Kaggle, contains interviews, discussions and trends in machine learning
- Domino Data Blog: Provides a good perspective of the Big Data Science applications from a large company perspective
- Data Skeptic
- Machine Learning Guide
- Talking Machines
- DataHack Radio
- O’Reilly Data Show
- Learning Machines 101
Github to follow:
Most of the awesome-<everything>. These are curated list of videos, codes, podcasts, courses, blogs, everything under the sun, in relation to that topic. Some examples include:
- Awesome Deep Learning
- Awesome Machine Learning
- Awesome NLP
- Awesome RL
- Awesome Deep Vision
- Awesome Text Summarization
- Awesome Recommender Systems
- Ryan Adams
- Andrej Karpathy
- Fei Fei Li
- Nando de Freitas
- Ilya Sutskever
- Papers with code
- Rachel Thomas
- Chip Huyen
- Jeremy Horward
- Richard Socher
- Sebastian Ruder
- Pieter Abbeel
Other noteworthy mentions
Lastly, a few more that needs to be mentioned, that has caught my attention in the recent past are:
- Kaggle Reading Group and Live Coding by Rachael Tatman
- Github by hugging face
- Arxiv sanity: Search indexed for arxiv papers
- Fullstack Deeplearning: Deep learning with focus on deployment
I guess that is about it! Hope this helps you guys get started somewhere (whichever channel inspires you the best). Keep learning and keep contributing! May the force be with you!
Original. Reposted with permission.
Bio: Pramod is an AI solutioning expert, blogger and an avid speaker with 14 years of experience in building Data Science solutions in large organizations like Target and Fidelity using Reinforcement Learning, Recommendation Systems, NLP, Deep Learning, and massive data parallel computing.
- I wasn’t getting hired as a Data Scientist. So I sought data on who is.
- Why you’re not a job-ready data scientist (yet)
- If you’re a developer transitioning into data science, here are your best resources