2018 May

Overview of Dash Python Framework from Plotly for building dashboards

Introduction to Dash framework from Plotly, reactive framework for building dashboards in Python. Tech talk covers basics and more advanced topics like custom component and scaling.

on May 31, 2018 in Dashboard, Data Analytics, Data Visualization, Plotly, Python
On the contribution of neural networks and word embeddings in Natural Language Processing

In this post I will try to explain, in a very simplified way, how to apply neural networks and integrate word embeddings in text-based applications, and some of the main implicit benefits of using neural networks and word embeddings in NLP.

on May 31, 2018 in Neural Networks, NLP, Word Embeddings, word2vec
Cartoon: GDPR first effect on Privacy

New KDnuggets Cartoon examines the first unexpected effect of GDPR on Privacy.

on May 30, 2018 in Cartoon, GDPR, Privacy
Improving the Performance of a Neural Network

There are many techniques available that could help us achieve that. Follow along to get to know them and to build your own accurate neural network.

on May 30, 2018 in Ensemble Methods, Hyperparameter, Neural Networks, Overfitting, Tips
6 Tips for Effective Visualization with Tableau

We analyse principles for effective data visualization in Tableau, including color gradients, avoiding crowded dashboards, Tableau marks and more.

on May 29, 2018 in Advice, Data Visualization, Tableau, Visualization
Descriptive analytics, machine learning, and deep learning viewed via the lens of CRISP-DM

CRISP-DM methodology is a must teach to explain analytics project steps. This article purpose it to complement it with specific chart flow that explain as simply as possible how it is more likely used in descriptive analytics, classic machine learning or deep learning.

on May 29, 2018 in CRISP-DM, Deep Learning, Descriptive Analytics, Machine Learning
A Beginner’s Guide to the Data Science Pipeline

On one end was a pipe with an entrance and at the other end an exit. The pipe was also labeled with five distinct letters: "O.S.E.M.N."

on May 29, 2018 in Beginners, Data Science, Pipeline
10 More Free Must-Read Books for Machine Learning and Data Science

Summer, summer, summertime. Time to sit back and unwind. Or get your hands on some free machine learning and data science books and get your learn on. Check out this selection to get you started.

on May 28, 2018 in Books, Data Science, ebook, Free ebook, Machine Learning
Event Processing: Three Important Open Problems

This article summarizes the three most important problems to be solved in event processing. The facts in this article are supported by a recent survey and an analysis conducted on the industry trends.

on May 28, 2018 in Big Data, Data Analytics, Insights, Real-time, SQL, Streaming Analytics
Learn AI and Data Science rapidly based only on high school math – KDnuggets Offer

This 3-month program, created by Ajit Jaokar, who teaches at Oxford, is interactive and delivered by video. Coding examples are in Python. Places limited - check special KDnuggets rate.

on May 25, 2018 in AI, Ajit Jaokar, Data Science Education, Mathematics, Online Education, Python
Top 20 R Libraries for Data Science in 2018

We have prepared an infographic of Top 20 R packages for data science, which covers the libraries main features and GitHub activities, as all of the libraries are open-source.

on May 25, 2018 in Data Science, Infographic, R
Modelling Time Series Processes using GARCH

To go into the turbulent seas of volatile data and analyze it in a time changing setting, ARCH models were developed.

on May 25, 2018 in Modeling, R, Time Series
How to tackle common data cleaning issues in R

R is a great choice for manipulating, cleaning, summarizing, producing probability statistics, and so on. In addition, it's not going away anytime soon, it is platform independent, so what you create will run almost anywhere, and it has awesome help resources.

on May 24, 2018 in Book, Data Cleaning, ebook, Packt Publishing, R
Data Science: 4 Reasons Why Most Are Failing to Deliver

Data Science: Some see billions in returns, but most are failing to deliver. This article explores some of the reasons why this is the case.

on May 24, 2018 in Data Science, Deployment, Domino, Failure, Production
Scientific debt – what does it mean for Data Science?

This article analyses scientific debt - what it is and what it means for data science.

on May 23, 2018 in Business, Data Engineering, Data Science, DataCamp, Technical Debt
Why Data and Infrastructure are key to determining Customer Intent,
May 31 Webinar

Join Yieldmo, an advertising technology company and learn how Snowflake and Looker unleashed the potential of their mobile ad engagement data and drove more impactful marketing for their clients.

on May 22, 2018 in Advertising, Data Infrastructure, Data Warehouse, Looker, Privacy
Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis

Python continues to eat away at R, RapidMiner gains, SQL is steady, Tensorflow advances pulling along Keras, Hadoop drops, Data Science platforms consolidate, and more.

on May 22, 2018 in Anaconda, Data Mining Software, Data Science Platform, Hadoop, Keras, Poll, Python, R, RapidMiner, SQL, TensorFlow, Trends
If chatbots are to succeed, they need this

Can logic be used to make chatbots intelligent? In the 1960s this was taken for granted. Now we have all but forgotten the logical approach. Is it time for a revival?

on May 22, 2018 in AI, AlphaGo, Chatbot, Logic, NLP
ETL vs ELT: Considering the Advancement of Data Warehouses

The traditional concept of ETL is changing towards ELT – when you’re running transformations right in the data warehouse. Let’s see why it’s happening, what it means to have ETL vs ELT, and what we can expect in the future.

on May 22, 2018 in BigQuery, Data Warehouse, ELT, ETL, Statsbot
YouTube videos on database management, SQL, Datawarehousing, Business Intelligence, OLAP, Big Data, NoSQL databases, data quality, data governance and Analytics – free

Watch over 20 hours of YouTube videos on databases and database design, Physical Data Storage, Transaction Management and Database Access, and Data Warehousing, Data Governance and (Big) Data Analytics - all free.

on May 18, 2018 in Analytics, Bart Baesens, Big Data, Business Intelligence, Data Governance, Data Quality, Data Warehousing, Databases, NoSQL, SQL, Youtube
Optimization Using R

Optimization is a technique for finding out the best possible solution for a given problem for all the possible solutions. Optimization uses a rigorous mathematical model to find out the most efficient solution to the given problem.

on May 18, 2018 in Excel, Linear Programming, Optimization, R
9 Must-have skills you need to become a Data Scientist, updated

Check out this collection of 9 (plus some additional freebies) must-have skills for becoming a data scientist.

on May 17, 2018 in Burtch Works, Data Science Skills, Data Scientist, Simplilearn
An Introduction to Deep Learning for Tabular Data

This post will discuss a technique that many people don’t even realize is possible: the use of deep learning for tabular data, and in particular, the creation of embeddings for categorical variables.

on May 17, 2018 in Deep Learning, fast.ai, Kaggle, Neural Networks, Rachel Thomas, word2vec
How to Implement a YOLO (v3) Object Detector from Scratch in PyTorch: Part 1

The best way to go about learning object detection is to implement the algorithms by yourself, from scratch. This is exactly what we'll do in this tutorial.

on May 17, 2018 in Computer Vision, Image Recognition, Neural Networks, Object Detection, Python, PyTorch, YOLO
How to Organize Data Labeling for Machine Learning: Approaches and Tools

The main challenge for a data science team is to decide who will be responsible for labeling, estimate how much time it will take, and what tools are better to use.

on May 16, 2018 in Altexsoft, Crowdsourcing, Data Labeling, Data Preparation, Image Recognition, Machine Learning, Training Data
GANs in TensorFlow from the Command Line: Creating Your First GitHub Project

In this article I will present the steps to create your first GitHub Project. I will use as an example Generative Adversarial Networks.

on May 16, 2018 in GANs, Generative Adversarial Network, GitHub, Neural Networks, Python, Rubens Zimbres, TensorFlow
THE BOOK OF WHY: The New Science of Cause and Effect

A Turing Prize-winning computer scientist and statistician shows how understanding causality has revolutionized science and will revolutionize AI.

on May 15, 2018 in AI, Bayesian Networks, Book, Causality, Causation, Judea Pearl
Beyond Data Lakes and Data Warehousing

We give a comprehensive review of data lakes and data warehouses, and look at what the future holds for total data integration.

on May 15, 2018 in Data Lakes, Data Warehousing
Complete Guide to Build ConvNet HTTP-Based Application using TensorFlow and Flask RESTful Python API

In this tutorial, a CNN is to be built, and trained and tested against the CIFAR10 dataset. To make the model remotely accessible, a Flask Web application is created using Python to receive an uploaded image and return its classification label using HTTP.

on May 15, 2018 in API, Convolutional Neural Networks, Dropout, Flask, Neural Networks, Python, RESTful API, TensorFlow
A Brief Introduction to Wikidata

Like Wikipedia, there are all kinds of data stored in Wikidata. As such, when you are looking for a specific dataset or if you want to answer a curious question, it can be a good start looking for that data at Wikidata first.

on May 15, 2018 in RDF, SPARQL, Wikidata, Wikipedia
Data Engineer vs Data Scientist: the evolution of aggressive species

This article looks at how the two "species" - data scientists and data engineers - harmonise and coexist.

on May 14, 2018 in Career, Data Engineer, Data Science Education, Data Scientist, DSTI, indeed
Top Stories, May 7-13: 2018 KDnuggets Analytics, Data Mining, Data Science, Machine Learning Software Poll; WTF is a Tensor?!?

5 Reasons "Logistic Regression" should be the first thing you learn when becoming a Data Scientist; PyTorch Tensor Basics; Top 7 Data Science Use Cases in Finance; Detecting Breast Cancer with Deep Learning; To SQL or not To SQL: that is the question!

on May 14, 2018 in Top stories
Simple Derivatives with PyTorch

PyTorch includes an automatic differentiation package, autograd, which does the heavy lifting for finding derivatives. This post explores simple derivatives using autograd, outside of neural networks.

on May 14, 2018 in Python, PyTorch
Top SAS Courses Online

High quality SAS training for beginners is out there and I’ll help you find it.

on May 11, 2018 in Beginners, Coursera, Online Education, SAS, Udemy
PyTorch Tensor Basics

This is an introduction to PyTorch's Tensor class, which is reasonably analogous to Numpy's ndarray, and which forms the basis for building neural networks in PyTorch.

on May 11, 2018 in GPU, Python, PyTorch, Tensor
The Executive Guide to Data Science and Machine Learning

This article provides a short introductory guide for executives curious about data science or commonly used terms they may encounter when working with their data team. It may also be of interest to other business professionals who are collaborating with data teams or trying to learn data science within their unit.

on May 10, 2018 in Big Data, Business, Data Science, Machine Learning
Deep learning scaling is predictable, empirically

This study starts with a simple question: “how can we improve the state of the art in deep learning?”

on May 10, 2018 in Deep Learning, Machine Learning, Scalability
Top 7 Data Science Use Cases in Finance

We have prepared a list of data science use cases that have the highest impact on the finance sector. They cover very diverse business aspects from data management to trading strategies, but the common thing for them is the huge prospects to enhance financial solutions.

on May 10, 2018 in
Data Augmentation: How to use Deep Learning when you have Limited Data

This article is a comprehensive review of Data Augmentation techniques for Deep Learning, specific to images.

on May 9, 2018 in Data Preparation, Deep Learning
Detecting Breast Cancer with Deep Learning

Breast cancer is the most common invasive cancer in women, and the second main cause of cancer death in women, after lung cancer. In this article I will build a WideResNet based neural network to categorize slide images into two classes, one that contains breast cancer and other that doesn’t using Deep Learning Studio.

on May 9, 2018 in Cancer Detection, Deep Learning, Healthcare, Neural Networks
Torus for Docker-First Data Science

To help data science teams adopt Docker and apply DevOps best practices to streamline machine learning delivery pipelines, we open-sourced a toolkit based on the popular cookiecutter project structure.

on May 8, 2018 in Data Science, DevOps, Docker, Machine Learning Engineer, Open Source, Python
7 Useful Suggestions from Andrew Ng “Machine Learning Yearning”

Machine Learning Yearning is a book by AI and Deep Learning guru Andrew Ng, focusing on how to make machine learning algorithms work and how to structure machine learning projects. Here we present 7 very useful suggestions from the book.

on May 8, 2018 in Andrew Ng, Book, Data Cleaning, Data Preparation, Free ebook, Machine Learning, Metrics
Top Data Science, Machine Learning Courses from Udemy – May 2018

Learn Machine Learning, Data Science, Python, Azure Machine Learning, and more with Udemy Mother's Day $9.99 sale - get top courses from leading instructors.

on May 8, 2018 in Azure ML, Data Science, Machine Learning, Python, Udemy
5 Reasons Logistic Regression should be the first thing you learn when becoming a Data Scientist

Learn Logistic Regression first to become familiar with the pipeline and not being overwhelmed with fancy algorithms.

on May 8, 2018 in Data Scientist, Logistic Regression, Machine Learning
2018 KDnuggets Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months?

Vote in KDnuggets 19th Annual Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months?

on May 7, 2018 in Data Mining Software, Data Science, Machine Learning, Poll
Apache Spark : Python vs. Scala

When it comes to using the Apache Spark framework, the data science community is divided in two camps; one which prefers Scala whereas the other preferring Python. This article compares the two, listing their pros and cons.

on May 4, 2018 in Apache Spark, Java, Python, Scala
Skewness vs Kurtosis – The Robust Duo

Kurtosis and Skewness are very close relatives of the “data normalized statistical moment” family – Kurtosis being the fourth and Skewness the third moment, and yet they are often used to detect very different phenomena in data. At the same time, it is typically recommendable to analyse the outputs of both together to gather more insight and understand the nature of the data better.

on May 4, 2018 in Data Science, Descriptive Analytics, Statistics
AI is not set and forget

Just like a car, AI-based system can tick along in decent shape for a while. But neglect it too long and you’re in trouble. Unfortunately, failing to maintain your AI will destroy the project.

on May 3, 2018 in AI, Failure, Maintenance
Boost your data science skills. Learn linear algebra.

The aim of these notebooks is to help beginners/advanced beginners to grasp linear algebra concepts underlying deep learning and machine learning. Acquiring these skills can boost your ability to understand and apply various data science algorithms.

on May 3, 2018 in Data Science, Linear Algebra, Mathematics, numpy, Python
Best Practices in Data Visualization

Do your data visualizations need a reboot? Though data visualizations may be designed to facilitate understanding, not all graphs are effective. In this webcast, viewers will learn how to use best practices to give a graph a makeover.

on May 2, 2018 in Best Practices, Data Visualization, JMP
Hands-on: Intro to Python for Data Analysis

Learn one of the top languages used in data science and machine learning with this new hands-on course by TDWI Online Learning.

on May 2, 2018 in Data Analysis, Online Education, Python, TDWI
To Kaggle Or Not

Kaggle is the most well known competition platform for predictive modeling and analytics. This article looks into the different aspects of Kaggle and the benefits it can bring to data scientists.

on May 2, 2018 in Advice, Competition, Data Science, Kaggle
Getting Started with spaCy for Natural Language Processing

spaCy is a Python natural language processing library specifically designed with the goal of being a useful library for implementing production-ready systems. It is particularly fast and intuitive, making it a top contender for NLP tasks.

on May 2, 2018 in Data Preparation, Data Preprocessing, NLP, Python, Text Analytics, Text Mining
50+ Useful Machine Learning & Prediction APIs, 2018 Edition

Extensive list of 50+ APIs in Face and Image Recognition ,Text Analysis, NLP, Sentiment Analysis, Language Translation, Machine Learning and prediction.

on May 1, 2018 in API, Face Recognition, Image Recognition, Machine Learning, Natural Language Processing, Sentiment Analysis, Text Analytics
Data Science vs Machine Learning vs Data Analytics vs Business Analytics

This article gives a broad overview of data science and the various fields within it, including business analytics, data analytics, business intelligence, advanced analytics, machine learning, and AI.

on May 1, 2018 in AI, Business, Business Analytics, Data Analytics, Data Science, Machine Learning
Jupyter Notebook for Beginners: A Tutorial

The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. Although it is possible to use many different programming languages within Jupyter Notebooks, this article will focus on Python as it is the most common use case.

on May 1, 2018 in Data Analysis, GitHub, Jupyter, Matplotlib, Python
Implementing Deep Learning Methods and Feature Engineering for Text Data: FastText

Overall, FastText is a framework for learning word representations and also performing robust, fast and accurate text classification. The framework is open-sourced by Facebook on GitHub.

on May 1, 2018 in Facebook, Feature Engineering, NLP, Python

2018 May

Latest Posts

Top Posts