How To Defeat The Machine Learning Engineer Impostor Syndrome
How many times have you taken yet another online course on machine learning or read yet another paper on a new emerging topic, to be up-to-date in this crazy fast-paced AI/ML world -- only to keep feeling like an ML engineer impostor? These three personal tips can help you overcome the classic (and common) impostor syndrome behind every emerging ML engineer who wants to be better at what you do.
By Pau Labarta Bajo, mathematician and data scientist.
When I first applied to Toptal, I wanted to become both a freelancer and a “real ML engineer” at the same time.
Before that, I worked as a Machine Learning engineer at Nordeus, a top mobile gaming company famous for having Mourinho’s face on its flagship game: TopEleven. My Machine Learning adventure at Nordeus consisted of designing and implementing an intelligent system to help the customer support team resolve player issues faster. The essence of it was to build a text classifier from a ton of historical player tickets and agent resolutions.
I had the whole system in mind, the data (at least that is what I thought), and access to GPUs. On paper, everything seemed just right for me to shine and deliver a great model and an even greater solution.
But this never happened. To my despair, it took me more than a month to realize that the dataset I was trying to use to train my supervised model was as bad as it could be. Before getting to this realization, I had spent uncountable hours and Jupyter notebooks trying to make this whole thing work. I was so busy working that I could not find time to look at the data. We could say my lack of experience didn’t help.
Three months after this failed project, I decided to quit my job and start my freelance path at Toptal. After a couple of rounds of interviews and technical screening, I got to the last round. Guess what? I had to solve a machine learning assignment. Almost identical to the one I had failed previously. And I had a week to complete it.
It is hard to describe the amount of negative self-talk I had to battle against during that week. The long shadow of the impostor syndrome was obfuscating my mind.
This chapter had a happy ending. I solved the problem well, and I got into Toptal. Three years and 10 projects later, I might say I am handling the impostor syndrome much better.
Tip 1. Be brave and try freelancing
Being brave is the thing that will help you the most. And working freelance IS brave. Check my previous article on how to become a freelance data scientist if you want to know more.
When you work as a freelancer/contractor, feedback from your work does not come in quarterly or yearly reviews. It comes every single day. And there is no way to hack that. Clients expect you to deliver quality and fast. By the way, this is the main reason why you will be better paid than in your current job.
Once you feel you have the fundamentals of ML right, put yourself in the ring. Test yourself. You are smart, and you can do it. Taking more online courses does not make the impostor syndrome go away. Believe me.
The top 2 freelancing platforms IMHO are
- Toptal. The world’s number 1 top-talent freelance platform in the world. They have a rather harsh application process, but it is very worth going through it. Great enterprises and rising startups are using Toptal to implement ML solutions these days. Getting there will give you plenty of opportunities to shine.
- Braintrust. An emerging talent network, inspired by a fresh new economic model (heard of Ethereum?). They are growing fast, and I expect them to catch Toptal soon.
Tip 2. Never forget the data. NEVER
ML engineering is harder than traditional software engineering because of DATA (capital letters, yes).
Rare are the occasions when you are given a clean and complete set of features and labels to build your ML model. Instead, you often need to generate the training data yourself. The most common problems I have faced in this sense are:
- The training data has a bug. Usually, you generate this data from an SQL query and some kind of Python script to automate the extraction of raw features and labels. Writing SQL queries is simple, but debugging them can get pretty hard. Working on your SQL skills is one of the best things you can do to become a better ML engineer.
- The training data is incomplete. After generating the first version of your training data, you jump to the next phase of ML development and build a quick baseline model. More often than not, this baseline model is not good enough to solve the business problem, and so you need to iterate. Non-experienced ML developers tend to focus too much on improving the model and forget about the dataset they created. This is a typical mistake that causes frustration and impostor syndrome kind of feelings. Go back to the DATA. Extend your SQL query to add more relevant features. Talk to domain experts in your environment (data engineers, business intelligence guys…) that can help you fetch the data that will move the needle and unblock you.
DATA is the magic ingredient that fuels all models, from simple linear regressions to colossal Transformer models. If the fuel is not ok, it does not matter which car you are driving. You are not going to move.
This sounds so trivial and stupid that we (I include myself) ML engineers have a surprising tendency to forget. As you get more experience building ML solutions, you get better at remembering this and going back to the DATA whenever you hit a wall.
You cannot use Stackoverflow to debug your dataset. You are alone there. And you need to change your mindset. You have to behave as a problem-solver. You need to get to know the dataset, and the best way is to visualize it. I personally love Tableau Desktop, but there are other options out there like Power BI, Apache Superset, etc. There are even Python libraries if you prefer, like sweetviz.
No matter the tool you prefer, go back to the DATA every time you get stuck.
Tip 3: Do not expect to know everything (especially at the beginning).
Machine learning is a field that covers a wide spectrum of technical complexity: software development, operationalization (MLOps), classical ML, cutting-edge research on Deep Learning, hardware optimization…
If you try to cover everything, you will lose focus and wander too much on the surface. Knowing something in ML means you have implemented it yourself. Full stop.
It is fantastic to keep up to date with the latest advancements in DL, for example. But do it in a principled way. Set yourself a clear goal (e.g., I want to become an expert in Transformer models) and build yourself a path towards that goal, selecting relevant papers, libraries, webinars, and even conferences.
Jumping from one topic to another keeps you busy but not focused. Stay humble. Start small and focused. Once you get there, take the next step and conquer another field.
Conquering your fears is a daily (full-time) job. Not only in Machine Learning but in every aspect of your life in which you want to grow and be better tomorrow.
Original. Reposted with permission.
Bio: Pau Labarta Bajo (@paulabartabajo_) is a mathematician and AI/ML freelancer and speaker, with over 10 years of experience crunching numbers and models for different problems, including financial trading, mobile gaming, online shopping, and healthcare.
- How Data Professionals Can Impress Even When Busy
- 11 Most Practical Data Science Skills for 2022
- Avoid These Five Behaviors That Make You Look Like A Data Novice