2018 Aug

All (50) | News, Features (1) | Opinions, Interviews (11) | Tutorials, Overviews (38)

AI Knowledge Map: How To Classify AI Technologies

What follows is then an effort to draw an architecture to access knowledge on AI and follow emergent dynamics, a gateway of pre-existing knowledge on the topic that will allow you to scout around for additional information and eventually create new knowledge on AI.

on Aug 31, 2018 in AI, Classification, Deep Learning, Machine Intelligence, Machine Learning, Neural Networks
Topic Modeling with LSA, PLSA, LDA & lda2Vec

This article is a comprehensive overview of Topic Modeling and its associated techniques.

on Aug 30, 2018 in LDA, NLP, Text Analytics, Topic Modeling
Skip the Interview! 9 Benefits of Career Fairs

Career fairs are a great way to get your feet wet if you’re just starting your data science career, or to be exposed to newer trends and emerging organizations if you’re already established. What other ways are career fairs beneficial?

on Aug 29, 2018 in Career, Interview, ODSC
Word Vectors in Natural Language Processing: Global Vectors (GloVe)

A well-known model that learns vectors or words from their co-occurrence information is GlobalVectors (GloVe). While word2vec is a predictive model — a feed-forward neural network that learns vectors to improve the predictive ability, GloVe is a count-based model.

on Aug 29, 2018 in NLP, Sciforce, Text Analytics, word2vec
Linear Regression In Real Life

A helpful guide to Linear Regression, using an example of a friends road trip to Las Vegas to highlight how it can be used in a real life situation.

on Aug 28, 2018 in Beginners, Linear Regression
How to Make Your Machine Learning Models Robust to Outliers

In this blog, we’ll try to understand the different interpretations of this “distant” notion. We will also look into the outlier detection and treatment techniques while seeing their impact on different types of machine learning models.

on Aug 28, 2018 in Machine Learning, Modeling, Outliers
Are Vectorized Random Number Generators Actually Useful?

I reported that you can multiply the speed of common (fast) random number generators such as PCG and xorshift128+ by a factor of three or four by vectorizing them using SIMD instructions. Is this actually useful in practice?

on Aug 28, 2018 in Parallelism, Programming, Random, Randomization
Multi-Class Text Classification with Scikit-Learn

The vast majority of text classification articles and tutorials on the internet are binary text classification such as email spam filtering and sentiment analysis. Real world problem are much more complicated than that.

on Aug 27, 2018 in NLP, Python, scikit-learn, Text Classification, Text Mining
Data Visualization Cheat Sheet

Core principles for successful data visualization, including tips on how to reduce clutter, preattentive processing and how to integrate text within the graph.

on Aug 24, 2018 in Cheat Sheet, Data Visualization
Emotion and Sentiment Analysis: A Practitioner’s Guide to NLP

Sentiment analysis is widely used, especially as a part of social media analysis for any domain, be it a business, a recent movie, or a product launch, to understand its reception by the people and what they think of it based on their opinions or, you guessed it, sentiment!

By Dipanjan Sarkar on Aug 24, 2018 in NLP, Text Analytics, Workflow
The 2018 Data Scientist Report is Here

Learn about the data and tools that data scientists are working with in 2018, Ethical issues around AI, Algorithmic bias, Job satisfaction, and more.

on Aug 23, 2018 in Bias, Career, Data Science Platform, Data Science Tools, Data Scientist, Ethics, Figure Eight
DynamoDB vs. Cassandra: from “no idea” to “it’s a no-brainer”

DynamoDB vs. Cassandra: have they got anything in common? If yes, what? If no, what are the differences? We answer these questions and examine performance of both databases.

on Aug 23, 2018 in Amazon, Apache, AWS, Cassandra, DynamoDB
Comparison of the Most Useful Text Processing APIs

There is a need to compare different APIs to understand key pros and cons they have and when it is better to use one API instead of the other. Let us proceed with the comparison.

on Aug 23, 2018 in NLP, Text Analytics, Text Mining
9 Things You Should Know About TensorFlow

A summary of the key points from the Google Cloud Next in San Francisco, "What’s New with TensorFlow?", including neural networks, TensorFlow Lite, data pipelines and more.

on Aug 22, 2018 in Deep Learning, Google, Keras, Machine Learning, Python, TensorFlow
Leveraging Agent-based Models (ABM) and Digital Twins to Prevent Injuries

Both athletes and machines deal with inter-twined complex systems (where the interactions of one complex system can have a ripple effect on others) that can have significant impact on their operational effectiveness.

on Aug 22, 2018 in Health, IoT, Modeling, Sports
Docker Cheat Sheet

This comprehensive cheat sheet will assist Docker users, experienced and new, in getting containers up-and-running quickly. We list commands that will allow users to install, build, ship and run Docker containers.

on Aug 21, 2018 in Cheat Sheet, Docker
UX Design Guide for Data Scientists and AI Products

Realizing that there is a legitimate knowledge gap between UX Designers and Data Scientists, I have decided to attempt addressing the needs from the Data Scientist’s perspective.

on Aug 21, 2018 in AI, Data Science, Data Scientist, UI/UX
Basic Statistics in Python: Probability

At the most basic level, probability seeks to answer the question, "What is the chance of an event happening?" To calculate the chance of an event happening, we also need to consider all the other events that can occur.

on Aug 21, 2018 in Normal Distribution, Probability, Python, Statistics
Interpreting a data set, beginning to end

Detailed knowledge of your data is key to understanding it! We review several important methods that to understand the data, including summary statistics with visualization, embedding methods like PCA and t-SNE, and Topological Data Analysis.

on Aug 20, 2018 in Analytics, Big Data, Data Science, Data Visualization, Machine Learning, SAS, Statistics, t-SNE
Why Automated Feature Engineering Will Change the Way You Do Machine Learning

Automated feature engineering will save you time, build better predictive models, create meaningful features, and prevent data leakage.

on Aug 20, 2018 in Automated Machine Learning, Feature Engineering, Machine Learning, Python
Cartoon: Machine Learning takes a vacation

August is a popular time for vacation, and even hard-working AI may want to take a few epochs off from its training. KDnuggets Cartoon looks at how this might go.

on Aug 18, 2018 in Cartoon, Deep Learning, Humor, Machine Learning, Robots
Introduction to Fraud Detection Systems

Using the Python gradient boosting library LightGBM, this article introduces fraud detection systems, with code samples included to help you get started.

on Aug 17, 2018 in Fraud Detection, Gradient Boosting, Python
Auto-Keras, or How You can Create a Deep Learning Model in 4 Lines of Code

Auto-Keras is an open source software library for automated machine learning. Auto-Keras provides functions to automatically search for architecture and hyperparameters of deep learning models.

on Aug 17, 2018 in Automated Machine Learning, Keras, Neural Networks, Python
Named Entity Recognition: A Practitioner’s Guide to NLP

Named entity recognition (NER) , also known as entity chunking/extraction , is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.

on Aug 17, 2018 in NLP, Text Analytics, Workflow
Project Hydrogen, new initiative based on Apache Spark to support AI and Data Science

An introduction to Project Hydrogen: how it can assist machine learning and AI frameworks on Apache Spark and what distinguishes it from other open source projects.

on Aug 16, 2018 in AI, Apache Spark, Data Science, Databricks, Distributed Computing, Production
Reinforcement Learning: The Business Use Case, Part 2

In this post, I will explore the implementation of reinforcement learning in trading. The Financial industry has been exploring the applications of Artificial Intelligence and Machine Learning for their use-cases, but the monetary risk has prompted reluctance.

on Aug 16, 2018 in Business, Finance, Machine Learning, Reinforcement Learning, Use Cases
A Crash Course in MXNet Tensor Basics & Simple Automatic Differentiation

This is an overview of some basic functionality of the MXNet ndarray package for creating tensor-like objects, and using the autograd package for performing automatic differentiation.

on Aug 16, 2018 in GPU, MXNet, Python, Tensor
An Introduction to t-SNE with Python Example

In this post we’ll give an introduction to the exploratory and visualization t-SNE algorithm. t-SNE is a powerful dimension reduction and visualization technique used on high dimensional data.

on Aug 15, 2018 in Clustering, Data Visualization, PCA, Python, t-SNE
Data Scientist guide for getting started with Docker

Docker is an increasingly popular way to create and deploy applications through virtualization, but can it be useful for data scientists? This guide should help you quickly get started.

on Aug 14, 2018 in Data Science, Data Scientist, Docker, Jupyter
Unveiling Mathematics Behind XGBoost

Follow me till the end, and I assure you will atleast get a sense of what is happening underneath the revolutionary machine learning model.

on Aug 14, 2018 in Gradient Boosting, Mathematics, XGBoost
Setting up your AI Dev Environment in 5 Minutes

Whether you're a novice data science enthusiast setting up TensorFlow for the first time, or a seasoned AI engineer working with terabytes of data, getting your libraries, packages, and frameworks installed is always a struggle. Learn how datmo, an open source python package, helps you get started in minutes.

on Aug 13, 2018 in AI, datmo, Development, Docker, Machine Learning, Python, TensorFlow
Unsupervised Learning Demystified

Unsupervised learning is a pattern-finding technique for mining inspiration from your data. Let's demystify!

on Aug 13, 2018 in Cassie Kozyrkov, Clustering, Machine Learning, Unsupervised Learning
Affordable online news archives for academic research

Many researchers need access to multi-year historical repositories of online news articles. We identified three companies that make such access affordable, and spoke with their CEOs.

on Aug 10, 2018 in API, Research, Text Analytics, Text Mining, Webhose
Understanding Language Syntax and Structure: A Practitioner’s Guide to NLP

Knowledge about the structure and syntax of language is helpful in many areas like text processing, annotation, and parsing for further operations such as text classification or summarization.

on Aug 10, 2018 in NLP, Text Analytics, Workflow
Top 10 roles in AI and data science

When you think of the perfect data science team, are you imagining 10 copies of the same professor of computer science and statistics, hands delicately stained with whiteboard marker? We hope not!

on Aug 9, 2018 in AI, Analyst, Analytics Manager, Career, Cassie Kozyrkov, Data Engineer, Data Scientist, Jobs, Machine Learning Engineer, Statistician
Building Reliable Machine Learning Models with Cross-validation

Cross-validation is frequently used to train, measure and finally select a machine learning model for a given dataset because it helps assess how the results of a model will generalize to an independent data set in practice.

on Aug 9, 2018 in Comet.ml, Cross-validation, Machine Learning, Modeling, scikit-learn
Reinforcement Learning: The Business Use Case, Part 1

At base, RL is a complex algorithm for mapping observed entities and measures into some set of actions, while optimizing for a long-term or short-term reward.

on Aug 9, 2018 in Business, Machine Learning, Reinforcement Learning, Use Cases
Optimization 101 for Data Scientists

We show how to use optimization strategies to make the best possible decision.

on Aug 8, 2018 in Football, Julia, Optimization, Python, R, Sports
How GOAT Taught a Machine to Love Sneakers

Embeddings are a fantastic tool to create reusable value with inherent properties similar to how humans interpret objects. GOAT uses deep learning to generate these for their entire sneaker catalogue.

on Aug 7, 2018 in Autoencoder, Deep Learning, Image Recognition, Word Embeddings
Programming Best Practices For Data Science

In this post, I'll go over the two mindsets most people switch between when doing programming work specifically for data science: the prototype mindset and the production mindset.

on Aug 7, 2018 in Best Practices, Data Science, Pandas, Programming, Python
Autoregressive Models in TensorFlow

This article investigates autoregressive models in TensorFlow, including autoregressive time series and predictions with the actual observations.

on Aug 6, 2018 in Regression, TensorFlow, Time Series
Only Numpy: Implementing GANs and Adam Optimizer using Numpy

This post is an implementation of GANs and the Adam optimizer using only Python and Numpy, with minimal focus on the underlying maths involved.

on Aug 6, 2018 in GANs, Generative Adversarial Network, Neural Networks, numpy, Optimization, Python
Eight iconic examples of data visualisation

A collection of the most exemplary examples of data visualizations, including Napoleons invasion of Russia and the iconic London Underground map.

on Aug 3, 2018 in Charts, Data, Data Visualization, Graphs, Maps
K-Means in Real Life: Clustering Workout Sessions

By using the within-cluster sum of squares as cost function, data points in the same cluster will be similar to each other, whereas data points in different clusters will have a lower level of similarity.

on Aug 3, 2018 in Clustering, Health, K-means
Text Wrangling & Pre-processing: A Practitioner’s Guide to NLP

I will highlight some of the most important steps which are used heavily in Natural Language Processing (NLP) pipelines and I frequently use them in my NLP projects.

on Aug 3, 2018 in Data Preprocessing, Data Wrangling, NLP, Text Analytics, Workflow
Data Scientist Interviews Demystified

We look at typical questions in a data science interview, examine the rationale for such questions, and hope to demystify the interview process for recent graduates and aspiring data scientists.

on Aug 2, 2018 in Data Science Skills, Hiring, Interview Questions, P-value, random forests algorithm, XGBoost
WTF is TF-IDF?

Relevant words are not necessarily the most frequent words since stopwords like “the”, “of” or “a” tend to occur very often in many documents.

on Aug 2, 2018 in Information Retrieval, Python, Text Analytics, Text Mining, TF-IDF
From Data to Viz: how to select the the right chart for your data

We offer an interactive, decision tree-style tool, which examines the data you have and proposes a set of potentially appropriate visualizations to represent your dataset.

on Aug 1, 2018 in Data, Data Visualization, ggplot2, GitHub, R, Tidyverse
Basic Statistics in Python: Descriptive Statistics

This article covers defining statistics, descriptive statistics, measures of central tendency, and measures of spread. This article assumes no prior knowledge of statistics, but does require at least a general knowledge of Python.

on Aug 1, 2018 in Descriptive Analytics, Python, Statistics
Selecting the Best Machine Learning Algorithm for Your Regression Problem

This post should then serve as a great aid in selecting the best ML algorithm for you regression problem!

on Aug 1, 2018 in Algorithms, Machine Learning, Regression

2018 Aug

Latest Posts

Top Posts