These days, I am sure 90% of LinkedIn traffic contains one of these terms: DS, ML or DL — acronyms for Data Science, Machine Learning or Deep Learning. Beware of the cliche though: “80% of all the statistics are made on the spot”. If you blinked on these acronyms perhaps you need to google a bit and then continue reading the rest of this post. This post has 2 goals. First, it attempts to put all the fellow Data Science learners at ease. Second, if you have just begun on the Data Science, this may serve you as a guide to the next step.
Here is an image I came across on the internet:
Quite overwhelming, isn’t it!!!!
Where to start! How to start?:
I started the journey of Data Science in the beginning of October, 2017. First 15 days spent in just trying to answer a single question, “What is Data Science” in a manner that is convincing to me. After browsing a variety of resources on the internet, Quora, Medium, Springboard blogs and e-books, Udacity blogs, Forbes, datascience.com, KDnuggets, datasciencecentral.com, analyticsvidhya and random web-crawling, absorbing all the information with spoonfuls of salt (pinches of salt not enough, don’t blame me for not warning you), I concluded that Data Science (in layman terms) is making a computer draw nice graphs using data and translating it into a story that makes sense to address business problems. Yes, it is that simple. No it is not? yes it is. Well, there are two broad types of Data Science jobs, here I am talking about Data science for business. The other one’s end product is not a story, but a data driven product. Let us not get into that because then we digress into Machine Learning Engineering. Typically Google, Facebook etc have data driven roles, which fall into second category. Much of the academic research is also of second type too.
Back to the first type, let me make it a little more advanced definition. Data Science is the process of coming up with answers to business questions with the help of historical data, by cleaning and analysing it first, then fitting it into one (or combination) of the machine learning model(s) and often forecasting and suggesting measures to prevent possible future issues. Ah. that is quite cool, isn’t it? Once I got convinced about first question, I thought about what is the best way to learn it.
“The best way to learn it”?!:
Again I spent a few days looking up this phrase and lo, I found countless advice. This time there was no other option but to try a few of them. I have Bachelors and Masters degrees in Electronics and Communications Engineering and a decade of experience of programming in languages like C/C++, Octave/Matlab, Verilog/SystemVerilog, Perl. Mathematics was my favorite subject since my childhood, Probability was the favorite in Masters. Experience with programming and Probability is a definite plus in my case.
I was a bit afraid of the term “Machine Learning”, and I am the type who likes to face the fear head on, so ended up enrolling in Prof Andrew’s Coursera course. That is the first one I took and I am glad it worked well for me. I was literally scared of both Python the snake and the language, but fortunately Andrew’s course exercises are in Octave. I tried learning Python fundamentals from Coursera, Udacity, Edx and Datacamp and chose Coursera and Datacamp. I knew that as a beginner in data science, R would perhaps be the better start. However, at that point of time, I was not too confident of taking only the data science path. Python is more general. Took several courses across several platform at the same time. Tried Intro to Machine Learning, Statistics, CS Fundamentals, Intro to Data Science etc on Udacity. Could not continue with them for a long time, as I do not like interrupts when a concept is being absorbed by my neurons.
Non-Data Scientific, yet Scientific Courses:
Meanwhile, it has been a long time since I had taken any coursework. I found a nice Coursera one on “Learning how to learn” by UC, San Diego. It helped me to reassure that the learning techniques I apply are good enough. In addition, it removed my doubts on whether I am not young enough to take up something new, as the recent research has proven that certain activities like exercise, meditation or just walking in nature (which anyway I do) give birth to new neurons in the brain and form new connections. The pomodoro technique presented is helpful. I also found it the right time to take up a course titled “A Life of Happiness and Fulfillment” by Indian School of Business. The content of this course has made me dispassionately pursue my passion of Data Science. It reminds me to learn it for the pure joy of learning and focus on the process, rather than the end product. Although these are non-technical, I find them super-helpful in fast and effective learning.
I attended a meetup in mid October and found that it was organized by a local DS Consulting company that also conducts training. I was not too impressed by their model. They train you with your own money and if you do well, recruit you as employee. The takeaway from the meetup was: “MOOCs will not get you a job, real projects, Kaggle competitions, having your own blogs will. Masters from reputed institutes will matter, but MOOC certificates from same institutes carry no value”
Here’s my take on that: “It doesn’t matter which path you take to learn. All that matters is, you can do a real world DS project”. If you have a way to prove that in an interview, why can’t you get a job? You don’t need to shell out thousands of dollars on bootcamps or get MOOC certificates either. You need to have a set of qualities/talents/skills to be a Data Scientist: good understanding of fundamentals in high-school math-probability-statistics, a lot of curiosity, inquisitiveness, inclination to learn new things, familiarity with programming, ability to document and present, and above all, you must know that you possess these. [If you have self-doubts, you need to get them out of the way first.] Learning the rest (like ML) follows. Companies, especially in smaller towns like mine are in so much need for a Data Scientist that they are looking for means to hire a good one. Keep in mind though, it is essential that you do a few real DS projects and showcase them to prospective employers through reports/presentations or github repo. If you are unsure how to do a real project, one way is to seek help from a mentor-an industry expert for technical stuff. Then the job search process calls for another post, out of scope of this one.
In summary, what is the best path to take to fill the gaps in essentials? There are no shortcuts. Try out a few platforms and see what suits best for you. Start with MOOCs followed by going hands on. Be sure to be organized and document your journey. First learn a topic which is not in your comfort zone. For example, if you already know C++, don’t jump into learning Python, instead know that you can do it eventually. It makes sense to try ML to see if you like it, because that is what separates a data scientist from a data analyst or data engineer. If you do not enjoy learning in any of them on your own, chances are that you won’t be able to learn even with a bootcamp. Data Science is a field which requires learning everyday: new tools, new concepts/algorithms, new business/domains, could be anything on earth, there are only steps, and no end.
Bio: Aparna C Shastry is a Data Science Aspirant, Learner, Parent of 2 lovely kids (teachers).