A detailed explanation of random forests, with real life use cases, a discussion into when a random forest is a poor choice relative to other algorithms, and looking at some of the advantages of using random forest.
With a new year upon us, I thought it would be a good time to revisit the concept and put together a new learning path for mastering machine learning with Python. With these 7 steps you can master basic machine learning with Python!
The aim of this post, then, is to present the characteristic project flow that I have identified in the working process of both my colleagues and myself in recent years. Hopefully, this can help both data scientists and the people working with them to structure data science projects in a way that reflects their uniqueness.
We explain how to retrieve estimates of a model's performance using scoring metrics, before taking a look at finding and diagnosing the potential problems of a machine learning algorithm.
This is an overview of designing a computer program capable of developing predictive models without any manual intervention that are trained & evaluated in a lifelong machine learning setting in NeurIPS 2018 AutoML3 Challenge.
Logistic Regression is a Regression technique that is used when we have a categorical outcome (2 or more categories). Logistic Regression is one of the most easily interpretable classification techniques in a Data Scientist’s portfolio.
An organization can also reduce the cost of hiring many experts by applying AutoML in their data pipeline. AutoML also reduces the amount of time it would take to develop and test a machine learning model.
There are many algorithms to learn word embeddings. Here, we consider only one of them: word2vec, and only one version of word2vec called skip-gram, which works well in practice.
In simple words, one can say that ontology is the study of what there is. But there is another part to that definition that will help us in the following sections, and that is ontology is usually also taken to encompass problems about the most general features and relations of the entities which do exist.
Let’s imagine you are attempting to work on a machine learning project. This article will provide you with the step to step guide on the process that you can follow to implement a successful project.
Check out this series of articles on Apache Spark. Each part is a 10 minute tutorial on a particular Apache Spark topic. Read on to get up to speed using Spark.
I often have to loop over a set of objects to find the one with the greatest score. You can use an if statement and a placeholder, but there are more elegant ways!
When it comes to choosing the right book, you become immediately overwhelmed with the abundance of possibilities. In this review, we have collected our Top 10 NLP and Text Analysis Books of all time, ranging from beginners to experts.
Trying to keep up with advancements at the overlap of neural networks and natural language processing can be troublesome. That's where the today's spotlighted resource comes in.
There are many different approaches of how to compare two texts (strings of characters). Each has its own advantages and disadvantages and is good only for a range of specific use cases.
However, sometimes only a limited amount of data from the target distribution can be collected. It may not be sufficient to build the needed train/dev/test sets. What to do in such a case? Let us discuss some ideas!
A crucial aspect of machine learning is its ability to recognize error margins and to interpret data more precisely as rising numbers of datasets are fed through its neural network. Commonly referred to as backpropagation, it is a process that isn’t as complex as you might think.
This is a short collection of lessons learned using Colab as my main coding learning environment for the past few months. Some tricks are Colab specific, others as general Jupyter tips, and still more are filesystem related, but all have proven useful for me.