Matt Mayo

Data Science for Newbies: An Introductory Tutorial Series for Software Engineers

By Matt Mayo on May 31, 2017 in Apache Spark, Data Science, Jupyter, Machine Learning, Pandas, Python, Reddit, Scala, SQL
This post summarizes and links to the individual tutorials which make up this introductory look at data science for newbies, mainly focusing on the tools, with a practical bent, written by a software engineer from the perspective of a software engineering approach.
Top Stories, May 22-28: Analytics, Data Science, Machine Learning Software Poll Results; Machine Learning Crash Course

By Matt Mayo on May 29, 2017 in Top stories
New Leader, Trends, and Surprises in Analytics, Data Science, Machine Learning Software Poll; Machine Learning Crash Course: Part 1; Text Mining 101: Mining Information From A Resume; Data science platforms are on the rise and IBM is leading the way; An Introduction to the MXNet Python API
What is an Ontology? The simplest definition you’ll find… or your money back*

By Matt Mayo on May 26, 2017 in GRAKN.AI, Graph, Ontology
This post takes the concept of an ontology and presents it in a clear and simple manner, devoid of the complexities that often surround such explanations.
Will Data Science Eliminate Data Science?

By Matt Mayo on May 25, 2017 in Automation, Data Science, Data Scientist
There are elements of what we do which are AI complete. Eventually, Artificial General Intelligence will eliminate the data scientist, but it’s not around the corner.
Machine Learning Crash Course: Part 1

By Matt Mayo on May 24, 2017 in Classification, Cost Function, Gradient Descent, Machine Learning, Regression
This post, the first in a series of ML tutorials, aims to make machine learning accessible to anyone willing to learn. We’ve designed it to give you a solid understanding of how ML algorithms work as well as provide you the knowledge to harness it in your projects.
Simplifying Data Pipelines in Hadoop: Overcoming the Growing Pains

By Matt Mayo on May 18, 2017 in Data Management, Data Platform, Hadoop, SVDS
Moving to Hadoop is not without its challenges—there are so many options, from tools to approaches, that can have a significant impact on the future success of a business’ strategy. Data management and data pipelining can be particularly difficult.
Teaching the Data Science Process

By Matt Mayo on May 17, 2017 in Data Science, Methodology, Process, Teaching
Understanding the process requires not only wide technical background in machine learning but also basic notions of businesses administration; here I will share my experience on teaching the data science process.
Introducing Dask-SearchCV: Distributed hyperparameter optimization with Scikit-Learn

By Matt Mayo on May 12, 2017 in Dask, Distributed Computing, Distributed Systems, Machine Learning, Optimization, scikit-learn
We introduce a new library for doing distributed hyperparameter optimization with Scikit-Learn estimators. We compare it to the existing Scikit-Learn implementations, and discuss when it may be useful compared to other approaches.
The Internet of Things in the Cloud

By Matt Mayo on May 11, 2017 in Cloud, Cloud Computing, Internet of Things, IoT, Scalability
Cloud computing is the next evolutionary step in Internet-based computing, which provides the means for delivering ICT resources as a service. Internet-of-Things can benefit from the scalability, performance and pay-as-you-go nature of cloud computing infrastructures.
Using Deep Learning To Extract Knowledge From Job Descriptions

By Matt Mayo on May 9, 2017 in Convolutional Neural Networks, Deep Learning, Natural Language Processing, Neural Networks, NLP, Text Mining
We present a deep learning approach to extract knowledge from a large amount of data from the recruitment space. A learning to rank approach is followed to train a convolutional neural network to generate job title and job description embeddings.

Matt Mayo

Latest Posts

Top Posts