-
What is the most important step in a machine learning project?
In any machine learning project, business understanding is very important. But in practice, it does not get enough attention. Here we explain what questions should be asked.
-
The Rise of GPU Databases
The recent but noticeable shift from CPUs to GPUs is mainly due to the unique benefits they bring to sectors like AdTech, finance, telco, retail, or security/IT . We examine where GPU databases shine.
-
Lessons Learned From Benchmarking Fast Machine Learning Algorithms
Boosted decision trees are responsible for more than half of the winning solutions in machine learning challenges hosted at Kaggle, and require minimal tuning. We evaluate two popular tree boosting software packages: XGBoost and LightGBM and draw 4 important lessons.
-
Making Predictive Models Robust: Holdout vs Cross-Validation
The validation step helps you find the best parameters for your predictive model and prevent overfitting. We examine pros and cons of two popular validation strategies: the hold-out strategy and k-fold.
-
What Is Optimization And How Does It Benefit Business?
Here we explain what Mathematical Optimisation is, and discuss how it can be applied in business and finance to make decisions.
-
How Convolutional Neural Networks Accomplish Image Recognition?
Image recognition is very interesting and challenging field of study. Here we explain concepts, applications and techniques of image recognition using Convolutional Neural Networks.
-
Going deeper with recurrent networks: Sequence to Bag of Words Model
Deep learning makes it possible to convert unstructured text to computable formats, incorporating semantic knowledge to train machine learning models. These digital data troves help us understand people on a new level.
-
Why Apache Arrow is the future for open source-columnar memory analytics
Apache Arrow is a de-facto standard for columnar in-memory analytics. In the coming years we can expect all the big data platforms adopting Apache Arrow as its columnar in-memory layer.
-
Insights from Data mining of Airbnb Listings
AirBnB has 2 million listings and operates in 65,000 cities. Here we look at insights related to vacation rental space in the sharing economy using the property listings data for Texas, US.
-
How to squeeze the most from your training data
In many cases, getting enough well-labelled training data is a huge hurdle for developing accurate prediction systems. Here is an innovative approach which uses SVM to get the most from training data.
|