This post summarizes and links to the individual tutorials which make up this introductory look at data science for newbies, mainly focusing on the tools, with a practical bent, written by a software engineer from the perspective of a software engineering approach.
Nuts-ml is a new data pre-processing library in Python for GPU-based deep learning in vision. It provides common pre-processing functions as independent, reusable units. These so called ‘nuts’ can be freely arranged to build data flows that are efficient, easy to read and modify.
The influence of a Twitter user goes beyond the simple number of followers. We also want to examine how effective are tweets - how likely they are to be retweeted, favorited, or the links inside clicked upon. What exactly is an influential user depends on the definition.
This post is the first in a series of tutorials for implementing machine learning workflows in Python from scratch, covering the coding of algorithms and related tools from the ground up. The end result will be a handcrafted ML toolkit. This post starts things off with data preparation.
This post takes the concept of an ontology and presents it in a clear and simple manner, devoid of the complexities that often surround such explanations.
This post outlines an entire 6-part tutorial series on the MXNet deep learning library and its Python API. In-depth and descriptive, this is a great guide for anyone looking to start leveraging this powerful neural network library.
A meticulously compiled list as extensive as possible of every accelerator, incubator or program the author has read or bumped into over the past months. It looks like there are at least 29 of them. An interesting read for a wide variety of potentially interested parties - far beyond only the investor.
This post, the first in a series of ML tutorials, aims to make machine learning accessible to anyone willing to learn. We’ve designed it to give you a solid understanding of how ML algorithms work as well as provide you the knowledge to harness it in your projects.
This report is the second in a series analyzing data science related topics. This time around, specifically, we rank 15 top Python data science packages, hopefully with results of use to the data science community.
Propensity scores are used in quasi-experimental and non-experimental research when the researcher must make causal inferences, for example, that exposure to a chemical increases the risk of cancer.
Let's have a look at common quality issues facing Big Data in terms of the key characteristics of Big Data – Volume, Velocity, Variety, Veracity, and Value.
Hadoop Distributed File System (HDFS), and Hbase (Hadoop database) are key components of Big Data ecosystem. This blog explains the difference between HDFS and HBase with real-life use cases where they are best fit.
In short, you reach different resting placing with different SGD algorithms. That is, different SGDs just give you differing convergence rates due to different strategies, but we do expect that they all end up at the same results!
We introduce a new library for doing distributed hyperparameter optimization with Scikit-Learn estimators. We compare it to the existing Scikit-Learn implementations, and discuss when it may be useful compared to other approaches.
Cloud computing is the next evolutionary step in Internet-based computing, which provides the means for delivering ICT resources as a service. Internet-of-Things can benefit from the scalability, performance and pay-as-you-go nature of cloud computing infrastructures.
This post is a lean look at learning machine learning with R. It is a complete, if very short, course for the quick study hacker with no time (or patience) to spare.
In this month's installment of Machine Learning Projects You Can No Longer Overlook, we find some data preparation and exploration tools, a (the?) reinforcement learning "framework," a new automated machine learning library, and yet another distributed deep learning library.
We present a deep learning approach to extract knowledge from a large amount of data from the recruitment space. A learning to rank approach is followed to train a convolutional neural network to generate job title and job description embeddings.
Without knowing the ground truth of a dataset, then, how do we know what the optimal number of data clusters are? We will have a look at 2 particular popular methods for attempting to answer this question: the elbow method and the silhouette method.
Check out this Python deep learning virtual machine image, built on top of Ubuntu, which includes a number of machine learning tools and libraries, along with several projects to get up and running with right away.
Resampling is a solution which is very popular in dealing with class imbalance. Our research on churn prediction shows that balanced sampling is unnecessary.
The top machine learning videos on YouTube include lecture series from Stanford and Caltech, Google Tech Talks on deep learning, using machine learning to play Mario and Hearthstone, and detecting NHL goals from live streams.
Using TensorFlow from Python is like using Python to program another computer. Being thoughtful about the graphs you construct can help you avoid confusion and costly performance problems.
10 days may not seem like a lot of time, but with proper self-discipline and time-management, 10 days can provide enough time to gain a survey of the basic of machine learning, and even allow a new practitioner to apply some of these skills to their own project.