10 Simple Things to Try Before Neural Networks
Below are 10 simple things you should remember to try first before throwing in the towel and jumping straight to neural networks.
By Ngwa Bandolo Bobga Cyril, Data Analyst | Data Scientist
It is not always the big stuff or the latest packages that help improve the accuracy or performance of our #machine learning models.
At times we overlook the basics of Machine Learning and rush to higher order solutions. When the solution is just right there in front of us.
Below are 10 simple things you should remember to try first before throwing in the towel and jumping straight to RNNs and CNNs (of course there are datasets which merit you to start straight from LSTMs and BERT).Let us remind ourselves of our checklist before bringing out our Calculus skills.
1. Domain knowledge
Try to understand as much about the domain as you can. This will greatly help you in your predictive models and in coming up with great features.
2. Get more data
You can simply request for more data. The data you have might not be enough to give you an accurate model with a good bias-variance output.
3. Treat outliers
When using optimizers like RMSE or MSE, leaving outliers untreated in your dataset would lead to very poor results.
4. Try transforming your data
Simple transformations like “square” or “square root” can give your model “ideas” to better see patterns in your dataset. And of course if you suspect a lognormal distribution, then taking logs on your features would be very beneficial (especially when using linear models).
5. Do feature selection
The curse of dimensionality is not good. So selecting the most relevant features to include in your model, not only helps you reduce overfitting, it also helps your model run faster. So throw in some LASSO and let’s see which features would survive.
6. Do cross validation
Your test dataset should really be like your last defender before taking your model to production. So use cross-validation to reduce variance. And obtain a model which generalizes well with new data.
7. Try many algorithms
In the beginning you are not very sure of the distribution of your data. So try a couple of models and see which one optimizes your objectives or criteria. With time you would be better at knowing which model to use.
8. Hyperparameter tuning
Off course, you have to tune those hyper parameters like “Learning rate” so that your gradient descent is able to avoid being trapped in a local minima. You need to prune those decision trees to avoid overfitting.$
9. Use Ensemble
Bagging and Boosting have helped many win Kaggle competitions. Why not try same with your dataset. The power of the crowd.
10. Reshuffling your data
Yes. You read it right. The best ideas are the simplest. Just try it .Merely reshuffling your data often helps improve the performance. Who said machine learning models do not need our help to avoid bias?.
Hope it helps someone out there.
Wish you Good Data Luck!!!
Bio: Ngwa Bandolo Bobga Cyril is a Data Analyst/Data Scientist with more than 10 years experience in analytics, working as Head of Data Analytics and Business Performance for a Telecom Company, Yoomee Mobile (Douala). Check out Ngwa's YouTube videos, teaching machine learning, data science, and visualisation.
Original. Reposted with permission.
- Will Data Analysts be Replaced by AI?
- The Common Misconceptions About Machine Learning
- 19 Data Science Project Ideas for Beginners