2020 Jul

All (65) | Events (1) | News, Education (2) | Opinions (12) | Tutorials, Overviews (50)

Fuzzy Joins in Python with d6tjoin

Combining different data sources is a time suck! d6tjoin is a python library that lets you join pandas dataframes quickly and efficiently.

on Jul 31, 2020 in Data Processing, Pandas, Python
R squared Does Not Measure Predictive Capacity or Statistical Adequacy

The fact that R-squared shouldn't be used for deciding if you have an adequate model is counter-intuitive and is rarely explained clearly. This demonstration overviews how R-squared goodness-of-fit works in regression analysis and correlations, while showing why it is not a measure of statistical adequacy, so should not suggest anything about future predictive performance.

on Jul 31, 2020 in Predictive Analytics, Regression, Statistics
Scaling Computer Vision Models with Dataflow

Scaling Machine Learning models is hard and expensive. We will shortly introduce the Google Cloud service Dataflow, and how it can be used to run predictions on millions of images in a serverless way.

on Jul 31, 2020 in Computer Vision, Dataflow, Google, Python, Scalability
Awesome Machine Learning and AI Courses

Check out this list of awesome, free machine learning and artificial intelligence courses with video lectures.

on Jul 30, 2020 in AI, Courses, Machine Learning
A Complete Guide To Survival Analysis In Python, part 3

Concluding this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter based on different groups, a Log-Rank test, and Cox Regression, all with examples and shared code.

on Jul 30, 2020 in Jupyter, Python, Regression, Statistics, Survival Analysis
Math for Programmers!

Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer. Save 50% with code kdmath50.

on Jul 30, 2020 in Book, Manning, Mathematics, Programming
5 Big Trends in Data Analytics

Data analytics is the process by which data is deconstructed and examined for useful patterns and trends. Here we explore five trends making data analytics even more useful.

on Jul 30, 2020 in Analytics, Blockchain, Data Analytics, NLP, Trends
A Tour of End-to-End Machine Learning Platforms

An end-to-end machine learning platform needs a holistic approach. If you’re interested in learning more about a few well-known ML platforms, you’ve come to the right place!

on Jul 29, 2020 in AirBnB, Data Science Platform, Google, Machine Learning, MLOps, Netflix, Pipeline, Uber, Workflow
First Steps of a Data Science Project

Many data science projects are launched with good intentions, but fail to deliver because the correct process is not understood. To achieve good performance and results in this work, the first steps must include clearly defining goals and outcomes, collecting data, and preparing and exploring the data. This is all about solving problems, which requires a systematic process.

on Jul 29, 2020 in Beginners, Data Exploration, Data Preparation, Data Science
Why You Should Get Google’s New Machine Learning Certificate

Google is offering a new ML Engineer certificate, geared towards professionals who want to display their competency in topics like distributed model training and scaling to production. Is it worth it?

on Jul 29, 2020 in Certificate, Courses, Google, Machine Learning
5 Fantastic Natural Language Processing Books

This curated collection of 5 natural language processing books attempts to cover a number of different aspects of the field, balancing the practical and the theoretical. Check out these 5 fantastic selections now in order to improve your NLP skills.

on Jul 28, 2020 in Books, NLP
Essential Resources to Learn Bayesian Statistics

If you are interesting in becoming better at statistics and machine learning, then some time should be invested in diving deeper into Bayesian Statistics. While the topic is more advanced, applying these fundamentals to your work will advance your understanding and success as an ML expert.

on Jul 28, 2020 in Bayesian, Machine Learning, Markov Chain, Statistics
Building a Content-Based Book Recommendation Engine

In this blog, we will see how we can build a simple content-based recommender system using Goodreads data.

on Jul 28, 2020 in Python, Recommendation Engine, Recommender Systems
Deep Learning for Signal Processing: What You Need to Know

Signal Processing is a branch of electrical engineering that models and analyzes data representations of physical events. It is at the core of the digital world. And now, signal processing is starting to make some waves in deep learning.

on Jul 27, 2020 in Deep Learning, Neural Networks
Is depth useful for self-attention?

Learn about recent research that is the first to explain a surprising phenomenon where in BERT/Transformer-like architectures, deepening the network does not seem to be better than widening (or, increasing the representation dimension). This empirical observation is in contrast to a fundamental premise in deep learning.

on Jul 27, 2020 in Attention, BERT, Deep Learning, Research, Scalability, Transformer
Computational Linear Algebra for Coders: The Free Course

Interested in learning more about computational linear algebra? Check out this free course from fast.ai, structured with a top-down teaching method, and solidify your understanding of an important set of machine learning-related concepts.

on Jul 27, 2020 in Course, fast.ai, Linear Algebra
Labelling Data Using Snorkel

In this tutorial, we walk through the process of using Snorkel to generate labels for an unlabelled dataset. We will provide you examples of basic Snorkel components by guiding you through a real clinical application of Snorkel.

on Jul 24, 2020 in Data Labeling, Data Science, Deep Learning, Machine Learning, NLP, Python
Recommender Systems in a Nutshell

Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about recommender systems and the ways they are used.

on Jul 23, 2020 in Interview, Recommendation Engine, Recommender Systems
Monitoring Apache Spark – We’re building a better Spark UI

Data Mechanics is developing a free monitoring UI tool for Apache Spark to replace the Spark UI with a better UX, new metrics, and automated performance recommendations. Preview these high-level feedback features, and consider trying it out to support its first release.

on Jul 23, 2020 in Apache Spark, Monitoring, UI/UX
Powerful CSV processing with kdb+

This article provides a glimpse into the available tools to work with CSV files and describes how kdb+ and its query language q raise CSV processing to a new level of performance and simplicity.

on Jul 23, 2020 in Data Analysis, Data Processing, Python
Why would you put Scikit-learn in the browser?

Honestly? I don’t know. But I do think WebAssembly is a good target for ML/AI deployment (in the browser and beyond).

on Jul 22, 2020 in Deployment, Development, scikit-learn, Virtualization
10 Steps for Tackling Data Privacy and Security Laws in 2020

Data privacy laws, such as the CCPA, GDPR, and HIPAA, are here to stay and significantly impact everyone in the digital era. These steps will guide organizations to prepare for compliance and ensure they support the fundamental privacy rights of their customers and users.

on Jul 22, 2020 in Advice, Big Data, CCPA, GDPR, Privacy, Security
Apache Spark Cluster on Docker

Build your own Apache Spark cluster in standalone mode on Docker with a JupyterLab interface.

on Jul 22, 2020 in Apache Spark, Data Engineering, Docker, Jupyter, Python
Building a REST API with Tensorflow Serving (Part 2)

This post is the second part of the tutorial of Tensorflow Serving in order to productionize Tensorflow objects and build a REST API to make calls to them.

on Jul 21, 2020 in API, Docker, Keras, Python, TensorFlow
What I learned from looking at 200 machine learning tools

While hundreds of machine learning tools are available today, the ML software landscape may still be underdeveloped with more room to mature. This review considers the state of ML tools, existing challenges, and which frameworks are addressing the future of machine learning software.

on Jul 21, 2020 in Data Science Platform, Data Science Tools, Machine Learning, MLOps, Open Source, Tools
Data Mining and Machine Learning: Fundamental Concepts and Algorithms: The Free eBook

The second edition of Data Mining and Machine Learning: Fundamental Concepts and Algorithms is available to read freely online, and includes a new part on regression with chapters on linear regression, logistic regression, neural networks, deep learning and regression assessment.

on Jul 21, 2020 in Algorithms, Data Mining, Free ebook, Machine Learning
Recurrent Neural Networks (RNN): Deep Learning for Sequential Data

Recurrent Neural Networks can be used for a number of ways such as detecting the next word/letter, forecasting financial asset prices in a temporal space, action modeling in sports, music composition, image generation, and more.

on Jul 20, 2020 in Deep Learning, Python, Recurrent Neural Networks, Sequences, TensorFlow
Data Science MOOCs are too Superficial

Most massive open online courses are too superficial because they offer introductory-level courses. For in-depth knowledge, more is needed to increase your knowledge and expertise after establishing a foundation.

on Jul 20, 2020 in Data Science, MOOC, Online Education
Demystifying Statistical Significance

With more professionals from a wide range of less technical fields diving into statistical analysis and data modeling, these experimental techniques can seem daunting. To help with these hurdles, this article clarifies some misconceptions around p-values, hypothesis testing, and statistical significance.

on Jul 17, 2020 in P-value, Statistical Significance, Statistics
Wrapping Machine Learning Techniques Within AI-JACK Library in R

The article shows an approach to solving problem of selecting best technique in machine learning. This can be done in R using just one library called AI-JACK and the article shows how to use this tool.

on Jul 17, 2020 in Automated Machine Learning, AutoML, Machine Learning, Modeling, R
Free From Stanford: Ethical and Social Issues in Natural Language Processing

Perhaps it's time to take a look at this relatively new offering from Stanford, Ethical and Social Issues in Natural Language Processing (CS384), an advanced seminar course covering ethical and social issues in NLP.

on Jul 17, 2020 in Bias, Ethics, NLP, Social Good
Before Probability Distributions

Why do we use probability distributions, and why do they matter?

on Jul 16, 2020 in Distribution, Probability, Statistics
3 Advanced Python Features You Should Know

As a Data Scientist, you are already spending most of your time getting your data ready for prime time. Follow these real-world scenarios to learn how to leverage the advanced techniques in Python of list comprehension, Lambda expressions, and the Map function to get the job done faster.

on Jul 16, 2020 in Pandas, Programming, Python, Tips
Understanding How Neural Networks Think

A couple of years ago, Google published one of the most seminal papers in machine learning interpretability.

on Jul 16, 2020 in Google, Interpretability, Machine Learning
Math and Architectures of Deep Learning!

This hands-on book bridges the gap between theory and practice, showing you the math of deep learning algorithms side by side with an implementation in PyTorch. Save 50% with code kdarch50.

on Jul 15, 2020 in Architecture, Deep Learning, Manning, Mathematics, PyTorch
Apache Spark on Dataproc vs. Google BigQuery

This post looks at research undertaken to provide interactive business intelligence reports and visualizations for thousands of end users, in the hopes of addressing some of the challenges to architects and engineers looking at moving to Google Cloud Platform in selecting the best technology stack based on their requirements and to process large volumes of data in a cost effective yet reliable manner.

on Jul 15, 2020 in Apache Spark, BigQuery, Google
The Bitter Lesson of Machine Learning

Since that renowned conference at Dartmouth College in 1956, AI research has experienced many crests and troughs of progress through the years. From the many lessons learned during this time, some have needed to be re-learned -- repeatedly -- and the most important of which has also been the most difficult to accept by many researchers.

on Jul 15, 2020 in AI, AlphaGo, Chess, Machine Learning, Reinforcement Learning, Richard Sutton, Scalability, Trends
Building a REST API with Tensorflow Serving (Part 1)

Part one of a tutorial to teach you how to build a REST API around functions or saved models created in Tensorflow. With Tensorflow Serving and Docker, defining endpoint URLs and sending HTTP requests is simple.

on Jul 15, 2020 in API, Keras, Python, TensorFlow
Clustering Uber Rideshare Data

This blog discusses clustering the Uber ridesharing dataset, with a focus on interpretation and understanding the concepts in the real world.

on Jul 14, 2020 in Clustering, Data Analysis, Uber
A Complete Guide To Survival Analysis In Python, part 2

Continuing with the second of this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter theory as well as the Nelson-Aalen fitter theory, both with examples and shared code.

on Jul 14, 2020 in Python, Statistics, Survival Analysis
Auto Rotate Images Using Deep Learning

Follow these 5 simple steps to auto rotate images and get the right angle in human photos using computer vision.

on Jul 14, 2020 in Computer Vision, Deep Learning, Face Detection, Image Processing
Foundations of Data Science: The Free eBook

As has become tradition on KDnuggets, let's start a new week with a new eBook. This time we check out a survey style text with a variety of topics, Foundations of Data Science.

on Jul 13, 2020 in Data Science, Free ebook
7 Signs you are data literate

Understanding data is key to being a Data Scientist. But, how can you know if you might be a good fit for the field when you haven't worked with much data? These telltale signs will suggest you are competent to work with data, and that you might have a talent for being data literate.

on Jul 13, 2020 in Advice, Career, Communication, Data Scientist
PyTorch LSTM: Text Generation Tutorial

Key element of LSTM is the ability to work with sequences and its gating mechanism.

By Domas Bitvinskas on Jul 13, 2020 in LSTM, Natural Language Generation, NLP, Python, PyTorch
Deep Learning in Finance: Is This The Future of the Financial Industry?

Get a handle on how deep learning is affecting the finance industry, and identify resources to further this understanding and increase your knowledge of the various aspects.

on Jul 10, 2020 in Deep Learning, Finance
Why Learn Python? Here Are 8 Data-Driven Reasons

Through this blog, I will list out the major reasons why you should learn Python and the 8 major data-driven reasons for learning it.

on Jul 10, 2020 in Data Science, Programming, Programming Languages, Python
5 Things You Don’t Know About PyCaret

In comparison with the other open source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with a few words only.

on Jul 9, 2020 in Machine Learning, PyCaret, Python
Understanding Time Series with R

Analyzing time series is such a useful resource for essentially any business, data scientists entering the field should bring with them a solid foundation in the technique. Here, we decompose the logical components of a time series using R to better understand how each plays a role in this type of analysis.

on Jul 9, 2020 in Beginners, Business Analytics, Data Analysis, R, Time Series
Pull and Analyze Financial Data Using a Simple Python Package

We demonstrate a simple Python script/package to help you pull financial data (all the important metrics and ratios that you can think of) and plot them.

on Jul 9, 2020 in Finance, Pandas, Python
Spam Filter in Python: Naive Bayes from Scratch

In this blog post, learn how to build a spam filter using Python and the multinomial Naive Bayes algorithm, with a goal of classifying messages with a greater than 80% accuracy.

on Jul 8, 2020 in Classification, Naive Bayes, Python, Text Classification
Some Things Uber Learned from Running Machine Learning at Scale

Uber machine learning runtime Michelangelo has been in operation for a few years. What has the Uber team learned?

on Jul 7, 2020 in Machine Learning, Scalability, Uber
A Complete Guide To Survival Analysis In Python, part 1

This three-part series covers a review with step-by-step explanations and code for how to perform statistical survival analysis used to investigate the time some event takes to occur, such as patient survival during the COVID-19 pandemic, the time to failure of engineering products, or even the time to closing a sale after an initial customer contact.

on Jul 7, 2020 in Python, Statistics, Survival Analysis
5th International Summer School 2020 on Resource-aware Machine Learning (REAML)

The Resource-aware Machine Learning summer school provides lectures on the latest research in machine learning, with the twist on resource consumption and how these can be reduced. This year it will be held online between 31st of August and 4th of September, and is free of charge. Register now.

on Jul 7, 2020 in Machine Learning, Online Education, Resource-aware, Summer School, TU Dortmund
PyTorch for Deep Learning: The Free eBook

For this week's free eBook, check out the newly released Deep Learning with PyTorch from Manning, made freely available via PyTorch's website for a limited time. Grab it now!

on Jul 7, 2020 in Deep Learning, Free ebook, Neural Networks, PyTorch
Scope and Impact of AI in Agriculture

The major advantage of focusing on AI-based methods is that they tackle each of the challenges faced by farmers from seed sowing to harvesting of crops separately and rather than generalising, provide customised solutions to a specific problem.

on Jul 6, 2020 in Agriculture, AI
A Layman’s Guide to Data Science. Part 3: Data Science Workflow

Learn and appreciate the typical workflow for a data science project, including data preparation (extraction, cleaning, and understanding), analysis (modeling), reflection (finding new paths), and communication of the results to others.

on Jul 6, 2020 in Beginners, Data Science, Data Workflow, Sciforce, Workflow
Exploratory Data Analysis on Steroids

This is a central aspect of Data Science, which sometimes gets overlooked. The first step of anything you do should be to know your data: understand it, get familiar with it. This concept gets even more important as you increase your data volume: imagine trying to parse through thousands or millions of registers and make sense out of them.

on Jul 6, 2020 in Data Analysis, Data Exploration, Data Preparation, Pandas, Python
Deploy Machine Learning Pipeline on AWS Fargate

A step-by-step beginner’s guide to containerize and deploy ML pipeline serverless on AWS Fargate.

on Jul 3, 2020 in AWS, Docker, Kubernetes, Machine Learning, Pipeline, PyCaret
Generating cooking recipes using TensorFlow and LSTM Recurrent Neural Network: A step-by-step guide

A character-level LSTM (Long short-term memory) RNN (Recurrent Neural Network) is trained on ~100k recipes dataset using TensorFlow. The model suggested the recipes "Cream Soda with Onions", "Puff Pastry Strawberry Soup", "Zucchini flavor Tea", and "Salmon Mousse of Beef and Stilton Salad with Jalapenos". Yum!? Follow along this detailed guide with code to create your own recipe-generating chef.

on Jul 3, 2020 in Cooking, Deep Learning, Humor, LSTM, Natural Language Generation, TensorFlow
Feature Engineering in SQL and Python: A Hybrid Approach

Set up your workstation, reduce workplace clutter, maintain a clean namespace, and effortlessly keep your dataset up-to-date.

on Jul 2, 2020 in Feature Engineering, Python, SQL
Getting Started with TensorFlow 2

Learn about the latest version of TensorFlow with this hands-on walk-through of implementing a classification problem with deep learning, how to plot it, and how to improve its results.

on Jul 2, 2020 in Advice, Beginners, Deep Learning, Python, Regularization, TensorFlow
PyTorch Multi-GPU Metrics Library and More in New PyTorch Lightning Release

PyTorch Lightning, a very light-weight structure for PyTorch, recently released version 0.8.1, a major milestone. With incredible user adoption and growth, they are continuing to build tools to easily do AI research.

on Jul 2, 2020 in GPU, Metrics, Python, PyTorch, PyTorch Lightning
Speed up your Numpy and Pandas with NumExpr Package

We show how to significantly speed up your mathematical calculations in Numpy and Pandas using a small library.

on Jul 1, 2020 in numpy, Pandas, Python
Largest Dataset Analyzed – Poll Results and Trends

The results show that despite the deluge of Big Data, large majority still works in Gigabyte or Megabyte-size datasets. Data Scientists work with the largest-size datasets, followed by Data Engineers, Data Analysts, and Business Analysts. Read more for details.

on Jul 1, 2020 in Data Scientist, Dataset, Largest, Poll, Trends
Data Cleaning: The secret ingredient to the success of any Data Science Project

With an uncleaned dataset, no matter what type of algorithm you try, you will never get accurate results. That is why data scientists spend a considerable amount of time on data cleaning.

on Jul 1, 2020 in Data Cleaning, Data Preparation, Data Science, Outliers, Python

2020 Jul

Latest Posts

Top Posts