Exploring the Real World of Data Science
An article highlighting things I’ve learned in the real world about data science.
By Dhilip Subramanian, Data Scientist and AI Enthusiast
Data science, machine learning and artificial intelligence have been hot domains for a few years now. Many people want to work as data scientists and are putting in an immense effort to upgrade their skills through university, online course or self-study. However, there are a lot of challenges in the real world in terms of working and solving a business problem. Non-technical skills are equally important in order to work as a data scientist. In this blog, I am sharing my personal experience that I have come across in my work as a data scientist.
Understanding the business problem
There are a lot of challenges in the real world problem that students don’t necessarily face at the University. In school, they used to get a structured problem and a popular dataset and eventually get the exact solution. However, the problem in the industry will often be unstructured and complex. Any assumptions on the problem will backfire in the real world. It is better to understand the business problem completely before diving into the analysis. Understanding business problems involves doing more research on the problem and its domain, planning, asking the clients the right questions and discuss with team members.
Data science is about logical thinking, generating more ideas and creativity in solving the problems. Hence, teamwork plays an important role in data science. It is also necessary to think multidimensionally rather than one dimensional. Team members could be coming from diverse backgrounds with different kinds of skill sets. Take the strength of each team member and distribute the work accordingly. This helped me to solve the problem in different ways and learn new things.
The other key skill is to be a good listener. Data science is about sharing and collaboration. Basically the person needs to understand the views of others in the team. Many times, other team members come up with good ideas and the ideas might be a unique one and it is necessary to listen and understand them in order to successfully implement it in the project. As I said above, data science is not a one-man show and it is always a team effort.
Data science or AI is a fast-evolving field, and as a result, there will always be something new and crucial to learn. It is very hard to remember everything and documentation facilitated me to overcome this challenge. Also, It helped me to crystallize my own thought process. I used to document my learnings, analysis, model process, experiments and the code. Also, I write my failed experiments and it’s reasons in a detailed manner, and it helped me to sharpen my ideas in the long run. In addition to that, it helped me to improve my communication and understanding the concepts in detail. You can document even small things that you learned or come across that make a big difference in the long run. Use your own convenient tools to document.
Working in an agile environment gives me clear planning, prioritisation, and direction at the start of each sprint. Having an agile mindset helps in responding to change and handling uncertainty. If you come across uncertainty, try out options, collect feedback and improve iteratively. It also gave me an opportunity to collaborate with different teams. Presenting a minimum viable product (MVP) in the form of a machine learning model at the end of each sprint to the stakeholders helped me to shape my projects in a better form. Also, feedback from the end of each sprint helped me to correct my mistakes and deliver the project efficiently.
Storytelling is an important part of data science. We are crunching the data and creating a model, and finding the insights. But, what does this model says in business terms? In other words how this model generates money for the company or solve the problem? Stakeholders and management are not interested in p-value or any other statistics. The main challenge here is explaining the model in simpler terms to a non-technical audience in an engaging manner. One way to explain the model via a short story. This is one of my biggest learning in the last year. Always, include good visualization and it helps to convey the message as a story. Storytelling is an art and it takes time and a lot of practice.
Creativity in showing the output
We always use traditional PPT for showing our work to the clients or stakeholders. Instead of PPT why don’t we create a web app or dashboard to explain our model output? Creating a web app or dashboard shows commitment to the project and also get connected with stakeholders and clients.
Always use version control
Version control is an important thing that everyone includes in the workflow. It helps to manage your codes centrally rather than saving it into PC/Laptop or external drive. This way, you can refer to the code or documents whenever you are working on a new project at any location.
I significantly improved my coding skills during the last 8 months. One thing I have learned in my work as well as in competitions is to write functional or object-oriented code to have maximum code reusability. This will help to use the code in future projects as well as reduces time in the current one. I used to document the code function whenever I referred to stackoverflow or google and this helped me to learn new things on coding. Always follow best practices and keep your code reader-friendly.
Ask for help
Data science is a blend of computer science, statistics, machine learning, and domain expertise. Hence it is required to have skills from handling different steps from cleaning data to interpreting the final model and deploying it. Don’t be intimidated, you can’t master data science in one day. So if you get into a difficult situation, feel free to ask for help, through which you will gain more knowledge and eventually make you confident about your approach.
AI is the new buzz in the IT industry and let’s come face to face with the fact that all of which can’t be assimilated by anyone in a short period of time. Decide to take it strategically by investing in one or two hours every day to learn new concepts and solve new problems which will include learning a new algorithm, coding, reading a blog, doing personal projects etc. Apart from all this, I would highly recommend reading non-technical books that help a lot on the flow and storytelling technique which will be a useful trait as we move on.
During my initial days, I was under the impression that in this analytical world everyone is a master of everything. But later I realized that my assumption was wrong. I understood its a continuous learning process for everyone here. The core thing to stay current on this game is the passion, curiosity, and thirst to learn more. Be it machine learning or deep learning or NLP, it is always the passion that solves complicated problems.
Disclaimer — This blog contains my personal experience. If any of this info helped you out, I’d love to hear.
Thanks for reading!
Bio: Dhilip Subramanian is a Mechanical Engineer and has completed his Master's in Analytics. He has 9 years of experience with specialization in various domains related to data including IT, marketing, banking, power, and manufacturing. He is passionate about NLP and machine learning. He is a contributor to the SAS community and loves to write technical articles on various aspects of data science on the Medium platform.
Original. Reposted with permission.
- If you had to start statistics all over again, where would you start?
- Five Cool Python Libraries for Data Science
- Easy Speech-to-Text with Python