The way most Machine Learning models work on Spark are not straightforward, and they need lots of feature engineering to work. That’s why we created the feature engineering section inside the Optimus Data Frame Transformer.
Feature selection is a very important technique in machine learning. In this post we discuss one of the most common optimization algorithms for multi-modal fitness landscapes - evolutionary algorithms.
Data science needs fast computation and transformation of data. NumPy objects in Python provides that advantage over regular programming constructs like for-loop. How to demonstrate it in few easy lines of code?
Introducing the Natural Language Processing Library for Apache Spark - and yes, you can actually use it for free! This post will give you a great overview of John Snow Labs NLP Library for Apache Spark.
One of the main principles I learned during my time at Google Brain was that unit tests can make or break your algorithm and can save you weeks of debugging and training time.
This is a visualization of the inter- and intra-continental migration of scientific researchers based on ORCID (Open Researcher and Contributor ID) data. It is best seen as a directional sample of all researchers, and tracks their earliest/latest countries with research activities as well as their PhD countries.
I found all 3 courses extremely useful and learned an incredible amount of practical knowledge from the instructor, Andrew Ng. Ng does an excellent job of filtering out the buzzwords and explaining the concepts in a clear and concise manner.
The definitions of training, validation, and test sets can be fairly nuanced, and the terms are sometimes inconsistently used. In the deep learning community, “test-time inference” is often used to refer to evaluating on data in production, which is not the technical definition of a test set.
This blog post is targeted towards people who have experience with machine learning, and want to get a better intuition on the different objective functions used to train neural networks.
Wikipedia is a rich source of well-organized textual data, and a vast collection of knowledge. What we will do here is build a corpus from the set of English Wikipedia articles, which is freely and conveniently available online.
Although NLP and text mining are not the same thing, they are closely related, deal with the same raw data type, and have some crossover in their uses. Let's discuss the steps in approaching these types of tasks.
We introduce a general framework for developing time series models, generating features and preprocessing the data, and exploring the potential to automate this process in order to apply advanced machine learning algorithms to almost any time series problem.
Playlists, individual tutorials (not part of a playlist) and online courses on Deep Learning (DL) in Python using the Keras, Theano, TensorFlow and PyTorch libraries. Assumes no prior knowledge. These videos cover all skill levels and time constraints!
Two years. Two years is the maximum amount of time you should spend focused on your learning, education and training. That’s exactly why this guide is focused on honing the most beneficial skills in two years.
If you follow AI you might have heard about the advent of the potentially revolutionary Capsule Networks. I will show you how you can start using them today.
PySpark is a Spark Python API that exposes the Spark programming model to Python - With it, you can speed up analytic applications. With Spark, you can get started with big data processing, as it has built-in modules for streaming, SQL, machine learning and graph processing.
Feature selection is a key part of data science but is it still relevant in the age of support vector machines (SVMs) and Deep Learning? Yes, absolutely. We explain why.
The author presents 10 statistical techniques which a data scientist needs to master. Build up your toolbox of data science tools by having a look at this great overview post.
The first comprehensive and objective survey of online Masters in Analytics / Data Science, including rankings, tuition, and duration of the education program.
This article will try to explain basic concepts and give some intuition of using different kinds of machine learning algorithms in different tasks. At the end of the article, you’ll find the structured overview of the main features of described algorithms.
Kevin and Koen may buy the same brand for the same reasons. On the other hand, they may buy the same brand for different reasons, or buy different brands for the same reasons, or even different brands for different reasons. The brands they purchase and the reasons why may vary by occasion, too.
Learning TensorFlow Core API, which is the lowest level API in TensorFlow, is a very good step for starting learning TensorFlow because it let you understand the kernel of the library. Here is a very simple example of TensorFlow Core API in which we create and train a linear regression model.
Ever had this great idea for a data science project or business? In the end you did not do it because you did not know how to make it a success? Today I am going to show you how to do it.
This post summarizes the contents of a recent O'Reilly article outlining a number of methods for interpreting machine learning models, beyond the usual go-to measures.
The advances in image classification, object detection, and semantic segmentation using deep Convolutional Neural Networks, which spawned the availability of open source tools such as Caffe and TensorFlow (to name a couple) to easily manipulate neural network graphs... made a very strong case in favor of CNNs for our classifier.
Once you’ve read this article, you will understand the basics of AI and ML. More importantly, you will understand how Deep Learning, the most popular type of ML, works.
In the past years, several niche tools have appeared to mine organizational business processes. In this article, we’ll show you that it is possible to get started with “process mining” using well-known data science programming languages as well.
In this extract from “Python Machine Learning” a top data scientist Sebastian Raschka explains 3 main types of machine learning: Supervised, Unsupervised and Reinforcement Learning. Use code PML250KDN to save 50% off the book cost.
Here is a machine learning getting started guide which grew out of the author's notes for a one hour talk on the subject. Hopefully you find the path helpful.