2020 Jul
All (89) | Events (3) | News, Education (7) | Opinions (16) | Top Stories, Tweets (9) | Tutorials, Overviews (54)
- Fuzzy Joins in Python with d6tjoin
- Jul 31, 2020.
Combining different data sources is a time suck! d6tjoin is a python library that lets you join pandas dataframes quickly and efficiently.
- R squared Does Not Measure Predictive Capacity or Statistical Adequacy
- Jul 31, 2020.
The fact that R-squared shouldn't be used for deciding if you have an adequate model is counter-intuitive and is rarely explained clearly. This demonstration overviews how R-squared goodness-of-fit works in regression analysis and correlations, while showing why it is not a measure of statistical adequacy, so should not suggest anything about future predictive performance.
- Scaling Computer Vision Models with Dataflow
- Jul 31, 2020.
Scaling Machine Learning models is hard and expensive. We will shortly introduce the Google Cloud service Dataflow, and how it can be used to run predictions on millions of images in a serverless way.
-
Awesome Machine Learning and AI Courses - Jul 30, 2020.
Check out this list of awesome, free machine learning and artificial intelligence courses with video lectures. - A Complete Guide To Survival Analysis In Python, part 3
- Jul 30, 2020.
Concluding this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter based on different groups, a Log-Rank test, and Cox Regression, all with examples and shared code.
- Math for Programmers!
- Jul 30, 2020.
Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer. Save 50% with code kdmath50.
- 5 Big Trends in Data Analytics
- Jul 30, 2020.
Data analytics is the process by which data is deconstructed and examined for useful patterns and trends. Here we explore five trends making data analytics even more useful.
- Top KDnuggets tweets, Jul 22-28: Increase your expertise in machine learning with a foundational understanding of Bayesian Statistics
- Jul 29, 2020.
Also Why You Should Get Google New #MachineLearning Certificate; Remote #DataScience Internships For Everyone.
- A Tour of End-to-End Machine Learning Platforms
- Jul 29, 2020.
An end-to-end machine learning platform needs a holistic approach. If you’re interested in learning more about a few well-known ML platforms, you’ve come to the right place!
-
First Steps of a Data Science Project - Jul 29, 2020.
Many data science projects are launched with good intentions, but fail to deliver because the correct process is not understood. To achieve good performance and results in this work, the first steps must include clearly defining goals and outcomes, collecting data, and preparing and exploring the data. This is all about solving problems, which requires a systematic process. - Why You Should Get Google’s New Machine Learning Certificate
- Jul 29, 2020.
Google is offering a new ML Engineer certificate, geared towards professionals who want to display their competency in topics like distributed model training and scaling to production. Is it worth it?
- Automating Security & Privacy Controls for Data Science & BI – Webinar
- Jul 28, 2020.
Moving sensitive data to the Cloud introduces the possibility of exposing data teams to new levels of risk, making it challenging to manage and prepare sensitive data for data science and analytics. Join our live webinar, Automating Security & Privacy Controls for Data Science & BI, Aug 12 @ 1PM ET to learn how Immuta for Databricks enables you to maximize the value of your sensitive data.
- 5 Fantastic Natural Language Processing Books
- Jul 28, 2020.
This curated collection of 5 natural language processing books attempts to cover a number of different aspects of the field, balancing the practical and the theoretical. Check out these 5 fantastic selections now in order to improve your NLP skills.
-
Essential Resources to Learn Bayesian Statistics - Jul 28, 2020.
If you are interesting in becoming better at statistics and machine learning, then some time should be invested in diving deeper into Bayesian Statistics. While the topic is more advanced, applying these fundamentals to your work will advance your understanding and success as an ML expert. - Building a Content-Based Book Recommendation Engine
- Jul 28, 2020.
In this blog, we will see how we can build a simple content-based recommender system using Goodreads data.
- Deep Learning for Signal Processing: What You Need to Know
- Jul 27, 2020.
Signal Processing is a branch of electrical engineering that models and analyzes data representations of physical events. It is at the core of the digital world. And now, signal processing is starting to make some waves in deep learning.
- Is depth useful for self-attention?
- Jul 27, 2020.
Learn about recent research that is the first to explain a surprising phenomenon where in BERT/Transformer-like architectures, deepening the network does not seem to be better than widening (or, increasing the representation dimension). This empirical observation is in contrast to a fundamental premise in deep learning.
- Top Stories, Jul 20-26: Data Science MOOCs are too Superficial
- Jul 27, 2020.
Also: Easy Guide To Data Preprocessing In Python; Data Mining and Machine Learning: Fundamental Concepts and Algorithms: The Free eBook; Recurrent Neural Networks (RNN): Deep Learning for Sequential Data; How Much Math do you need in Data Science?
-
Computational Linear Algebra for Coders: The Free Course - Jul 27, 2020.
Interested in learning more about computational linear algebra? Check out this free course from fast.ai, structured with a top-down teaching method, and solidify your understanding of an important set of machine learning-related concepts. - Labelling Data Using Snorkel
- Jul 24, 2020.
In this tutorial, we walk through the process of using Snorkel to generate labels for an unlabelled dataset. We will provide you examples of basic Snorkel components by guiding you through a real clinical application of Snorkel.
-
Easy Guide To Data Preprocessing In Python - Jul 24, 2020.
Preprocessing data for machine learning models is a core general skill for any Data Scientist or Machine Learning Engineer. Follow this guide using Pandas and Scikit-learn to improve your techniques and make sure your data leads to the best possible outcome. - Better Blog Post Analysis with googleAnalyticsR
- Jul 24, 2020.
In this post, we'll walk through using googleAnalyticsR for better blog post analysis, so you can do my better blog post analysis for yourself!
- Recommender Systems in a Nutshell
- Jul 23, 2020.
Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about recommender systems and the ways they are used.
- Monitoring Apache Spark – We’re building a better Spark UI
- Jul 23, 2020.
Data Mechanics is developing a free monitoring UI tool for Apache Spark to replace the Spark UI with a better UX, new metrics, and automated performance recommendations. Preview these high-level feedback features, and consider trying it out to support its first release.
- Powerful CSV processing with kdb+
- Jul 23, 2020.
This article provides a glimpse into the available tools to work with CSV files and describes how kdb+ and its query language q raise CSV processing to a new level of performance and simplicity.
- Top KDnuggets tweets, Jul 15-21: Big List of Awesome #MachineLearning and #AI Courses
- Jul 22, 2020.
Also: 5 Obscure #Python Libraries Every Data Scientist Should Know; 9 Skills That Separate Beginners From Intermediate #Python Programmers; Don't miss your copy of Learning Spark, 2nd Edition @databricks ; How Much Math do you need in Data Science?
- Why would you put Scikit-learn in the browser?
- Jul 22, 2020.
Honestly? I don’t know. But I do think WebAssembly is a good target for ML/AI deployment (in the browser and beyond).
- 10 Steps for Tackling Data Privacy and Security Laws in 2020
- Jul 22, 2020.
Data privacy laws, such as the CCPA, GDPR, and HIPAA, are here to stay and significantly impact everyone in the digital era. These steps will guide organizations to prepare for compliance and ensure they support the fundamental privacy rights of their customers and users.
- Apache Spark Cluster on Docker
- Jul 22, 2020.
Build your own Apache Spark cluster in standalone mode on Docker with a JupyterLab interface.
- Building a REST API with Tensorflow Serving (Part 2)
- Jul 21, 2020.
This post is the second part of the tutorial of Tensorflow Serving in order to productionize Tensorflow objects and build a REST API to make calls to them.
- What I learned from looking at 200 machine learning tools
- Jul 21, 2020.
While hundreds of machine learning tools are available today, the ML software landscape may still be underdeveloped with more room to mature. This review considers the state of ML tools, existing challenges, and which frameworks are addressing the future of machine learning software.
- Data Mining and Machine Learning: Fundamental Concepts and Algorithms: The Free eBook
- Jul 21, 2020.
The second edition of Data Mining and Machine Learning: Fundamental Concepts and Algorithms is available to read freely online, and includes a new part on regression with chapters on linear regression, logistic regression, neural networks, deep learning and regression assessment.
- Discover The Good, The Bad And The Ugly Of Two-Dimensional Score Matrices
- Jul 20, 2020.
Two-dimensional score matrices are used in marketing, origination, or account management to make decisions, with other variables or policy rules. Let’s examine the pros and cons of this approach.
- Recurrent Neural Networks (RNN): Deep Learning for Sequential Data
- Jul 20, 2020.
Recurrent Neural Networks can be used for a number of ways such as detecting the next word/letter, forecasting financial asset prices in a temporal space, action modeling in sports, music composition, image generation, and more.
-
Data Science MOOCs are too Superficial - Jul 20, 2020.
Most massive open online courses are too superficial because they offer introductory-level courses. For in-depth knowledge, more is needed to increase your knowledge and expertise after establishing a foundation. - Top Stories, Jul 13-19: The Bitter Lesson of Machine Learning
- Jul 20, 2020.
Also: 3 Advanced Python Features You Should Know; Understanding How Neural Networks Think; Free MIT Courses on Calculus: The Key to Understanding Deep Learning; How Much Math do you need in Data Science?
- How to Handle Dimensions in NumPy
- Jul 20, 2020.
Learn how to deal with Numpy matrix dimensionality using np.reshape, np.newaxis and np.expand_dims, illustrated with Python code.
- Demystifying Statistical Significance
- Jul 17, 2020.
With more professionals from a wide range of less technical fields diving into statistical analysis and data modeling, these experimental techniques can seem daunting. To help with these hurdles, this article clarifies some misconceptions around p-values, hypothesis testing, and statistical significance.
-
Wrapping Machine Learning Techniques Within AI-JACK Library in R - Jul 17, 2020.
The article shows an approach to solving problem of selecting best technique in machine learning. This can be done in R using just one library called AI-JACK and the article shows how to use this tool. - Free From Stanford: Ethical and Social Issues in Natural Language Processing
- Jul 17, 2020.
Perhaps it's time to take a look at this relatively new offering from Stanford, Ethical and Social Issues in Natural Language Processing (CS384), an advanced seminar course covering ethical and social issues in NLP.
- Scale sensitive data science and analytics with confidence
- Jul 16, 2020.
Listen to this on-demand webinar and hear how WorldQuant Predictive derives insights from building models on sensitive data while maximizing value and minimizing risk.
- Before Probability Distributions
- Jul 16, 2020.
Why do we use probability distributions, and why do they matter?
- 3 Advanced Python Features You Should Know
- Jul 16, 2020.
As a Data Scientist, you are already spending most of your time getting your data ready for prime time. Follow these real-world scenarios to learn how to leverage the advanced techniques in Python of list comprehension, Lambda expressions, and the Map function to get the job done faster.
- Understanding How Neural Networks Think
- Jul 16, 2020.
A couple of years ago, Google published one of the most seminal papers in machine learning interpretability.
- Top KDnuggets tweets, Jul 8-14: Free MIT Courses on Calculus: The Key to Understanding Deep Learning
- Jul 15, 2020.
Free MIT Courses on Calculus: The Key to Understanding Deep Learning; How Much Math do you need in Data Science? My Biggest Career Mistake In Data Science; Mathematics for Machine Learning: The Free eBook
- Math and Architectures of Deep Learning!
- Jul 15, 2020.
This hands-on book bridges the gap between theory and practice, showing you the math of deep learning algorithms side by side with an implementation in PyTorch. Save 50% with code kdarch50.
- Apache Spark on Dataproc vs. Google BigQuery
- Jul 15, 2020.
This post looks at research undertaken to provide interactive business intelligence reports and visualizations for thousands of end users, in the hopes of addressing some of the challenges to architects and engineers looking at moving to Google Cloud Platform in selecting the best technology stack based on their requirements and to process large volumes of data in a cost effective yet reliable manner.
-
The Bitter Lesson of Machine Learning - Jul 15, 2020.
Since that renowned conference at Dartmouth College in 1956, AI research has experienced many crests and troughs of progress through the years. From the many lessons learned during this time, some have needed to be re-learned -- repeatedly -- and the most important of which has also been the most difficult to accept by many researchers. - Building a REST API with Tensorflow Serving (Part 1)
- Jul 15, 2020.
Part one of a tutorial to teach you how to build a REST API around functions or saved models created in Tensorflow. With Tensorflow Serving and Docker, defining endpoint URLs and sending HTTP requests is simple.
- eBook: Data Integration and the R&D Organization
- Jul 14, 2020.
In this ebook, we’re looking at data integration — the process of combining information from different sources — and why it’s a valuable approach across the enterprise.
- Clustering Uber Rideshare Data
- Jul 14, 2020.
This blog discusses clustering the Uber ridesharing dataset, with a focus on interpretation and understanding the concepts in the real world.
- A Complete Guide To Survival Analysis In Python, part 2
- Jul 14, 2020.
Continuing with the second of this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter theory as well as the Nelson-Aalen fitter theory, both with examples and shared code.
- Auto Rotate Images Using Deep Learning
- Jul 14, 2020.
Follow these 5 simple steps to auto rotate images and get the right angle in human photos using computer vision.
- How I Solved Sudoku With a Business Rules Engine
- Jul 13, 2020.
"I set myself the challenge of using the optimized inference engine, along with a few other advanced features, of a decision rules management solution to solve Sudoku puzzles." Read the full post on how it was accomplished.
- Top Stories, Jul 6-12: A Layman’s Guide to Data Science Workflow; Free MIT Courses on Calculus: The Key to Understanding Deep Learning
- Jul 13, 2020.
Top Stories post excerpt: Also: A Complete Guide To Survival Analysis In Python, part 1; PyTorch for Deep Learning: The Free eBook; Exploratory Data Analysis on Steroids
- Foundations of Data Science: The Free eBook
- Jul 13, 2020.
As has become tradition on KDnuggets, let's start a new week with a new eBook. This time we check out a survey style text with a variety of topics, Foundations of Data Science.
- 7 Signs you are data literate
- Jul 13, 2020.
Understanding data is key to being a Data Scientist. But, how can you know if you might be a good fit for the field when you haven't worked with much data? These telltale signs will suggest you are competent to work with data, and that you might have a talent for being data literate.
- PyTorch LSTM: Text Generation Tutorial
- Jul 13, 2020.
Key element of LSTM is the ability to work with sequences and its gating mechanism.
- Top June Stories: How Much Math do you need in Data Science? Easy Speech-to-Text with Python
- Jul 10, 2020.
Also: Deep Neural Networks and the Jennifer Aniston Neuron; Don't Democratize Data Science.
- Deep Learning in Finance: Is This The Future of the Financial Industry?
- Jul 10, 2020.
Get a handle on how deep learning is affecting the finance industry, and identify resources to further this understanding and increase your knowledge of the various aspects.
- What every Data Scientist needs to learn from Business Leaders
- Jul 10, 2020.
You've learned so much to become a Data Scientist. Now, it's time to kick it up to the next level with advanced soft skills -- because these are important to the business for which you empower to make better decisions. Learning from the business leaders you support will help you develop a broader set of enhanced skills that will boost your Data Science quality and output.
- Why Learn Python? Here Are 8 Data-Driven Reasons
- Jul 10, 2020.
Through this blog, I will list out the major reasons why you should learn Python and the 8 major data-driven reasons for learning it.
- 5 Things You Don’t Know About PyCaret
- Jul 9, 2020.
In comparison with the other open source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with a few words only.
- Understanding Time Series with R
- Jul 9, 2020.
Analyzing time series is such a useful resource for essentially any business, data scientists entering the field should bring with them a solid foundation in the technique. Here, we decompose the logical components of a time series using R to better understand how each plays a role in this type of analysis.
- Learn Python, ML, Deep Learning, Data Visualization and more in Italy with BIG DIVE
- Jul 9, 2020.
Do you want to learn or upgrade your data data proficiency and push your career forward? This year, under the umbrella of BIG DIVE, TOP-IX presents four full-time 1-week courses from beginner to advanced levels. Read more and register now.
- Pull and Analyze Financial Data Using a Simple Python Package
- Jul 9, 2020.
We demonstrate a simple Python script/package to help you pull financial data (all the important metrics and ratios that you can think of) and plot them.
- Top KDnuggets tweets, Jul 01-07: Top 20 Latest Research Problems in #BigData and #DataScience
- Jul 8, 2020.
Also: 5 Ways to Detect #Outliers That Every #DataScientist Should Know #Python Code; The State of AI and Machine Learning 2020 - Just Released; Top 20 Latest Research Problems in #BigData and #DataScience; Python Libraries for Interpretable #MachineLearning #KDN
- Math for Programmers
- Jul 8, 2020.
Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer.
- Spam Filter in Python: Naive Bayes from Scratch
- Jul 8, 2020.
In this blog post, learn how to build a spam filter using Python and the multinomial Naive Bayes algorithm, with a goal of classifying messages with a greater than 80% accuracy.
- 5 Innovative AI Software Companies You Should Know
- Jul 8, 2020.
While machine learning is impacting organizations around the world, some are driving forward the real-world applications of innovative AI. Check out these interesting companies to watch for exciting new progress this year.
-
Free MIT Courses on Calculus: The Key to Understanding Deep Learning - Jul 8, 2020.
Calculus is the key to fully understanding how neural networks function. Go beyond a surface understanding of this mathematics discipline with these free course materials from MIT. - Some Things Uber Learned from Running Machine Learning at Scale
- Jul 7, 2020.
Uber machine learning runtime Michelangelo has been in operation for a few years. What has the Uber team learned?
-
A Complete Guide To Survival Analysis In Python, part 1 - Jul 7, 2020.
This three-part series covers a review with step-by-step explanations and code for how to perform statistical survival analysis used to investigate the time some event takes to occur, such as patient survival during the COVID-19 pandemic, the time to failure of engineering products, or even the time to closing a sale after an initial customer contact. - 5th International Summer School 2020 on Resource-aware Machine Learning (REAML)
- Jul 7, 2020.
The Resource-aware Machine Learning summer school provides lectures on the latest research in machine learning, with the twist on resource consumption and how these can be reduced. This year it will be held online between 31st of August and 4th of September, and is free of charge. Register now.
- PyTorch for Deep Learning: The Free eBook
- Jul 7, 2020.
For this week's free eBook, check out the newly released Deep Learning with PyTorch from Manning, made freely available via PyTorch's website for a limited time. Grab it now!
- Scope and Impact of AI in Agriculture
- Jul 6, 2020.
The major advantage of focusing on AI-based methods is that they tackle each of the challenges faced by farmers from seed sowing to harvesting of crops separately and rather than generalising, provide customised solutions to a specific problem.
- Top Stories, Jun 29 – Jul 5: Speed up your Numpy and Pandas with NumExpr Package; Deploy Machine Learning Pipeline on AWS Fargate
- Jul 6, 2020.
Also: Getting Started with TensorFlow 2; An Introduction to Statistical Learning: The Free eBook; How Much Math do you need in Data Science?; Data Cleaning: The secret ingredient to the success of any Data Science Project
-
A Layman’s Guide to Data Science. Part 3: Data Science Workflow - Jul 6, 2020.
Learn and appreciate the typical workflow for a data science project, including data preparation (extraction, cleaning, and understanding), analysis (modeling), reflection (finding new paths), and communication of the results to others. -
Exploratory Data Analysis on Steroids - Jul 6, 2020.
This is a central aspect of Data Science, which sometimes gets overlooked. The first step of anything you do should be to know your data: understand it, get familiar with it. This concept gets even more important as you increase your data volume: imagine trying to parse through thousands or millions of registers and make sense out of them. -
Deploy Machine Learning Pipeline on AWS Fargate - Jul 3, 2020.
A step-by-step beginner’s guide to containerize and deploy ML pipeline serverless on AWS Fargate. - Generating cooking recipes using TensorFlow and LSTM Recurrent Neural Network: A step-by-step guide
- Jul 3, 2020.
A character-level LSTM (Long short-term memory) RNN (Recurrent Neural Network) is trained on ~100k recipes dataset using TensorFlow. The model suggested the recipes "Cream Soda with Onions", "Puff Pastry Strawberry Soup", "Zucchini flavor Tea", and "Salmon Mousse of Beef and Stilton Salad with Jalapenos". Yum!? Follow along this detailed guide with code to create your own recipe-generating chef.
- Data Scientists Have Developed a Faster Way to Reduce Pollution, Cut Greenhouse Gas Emissions
- Jul 3, 2020.
Data science is helping with one of the world's most pressing issues. Read about an approach and specific steps being taken by data scientists to quickly reduce pollution and greenhouse gas emissions.
-
Feature Engineering in SQL and Python: A Hybrid Approach - Jul 2, 2020.
Set up your workstation, reduce workplace clutter, maintain a clean namespace, and effortlessly keep your dataset up-to-date. -
Getting Started with TensorFlow 2 - Jul 2, 2020.
Learn about the latest version of TensorFlow with this hands-on walk-through of implementing a classification problem with deep learning, how to plot it, and how to improve its results. - PyTorch Multi-GPU Metrics Library and More in New PyTorch Lightning Release
- Jul 2, 2020.
PyTorch Lightning, a very light-weight structure for PyTorch, recently released version 0.8.1, a major milestone. With incredible user adoption and growth, they are continuing to build tools to easily do AI research.
-
Speed up your Numpy and Pandas with NumExpr Package - Jul 1, 2020.
We show how to significantly speed up your mathematical calculations in Numpy and Pandas using a small library. - Largest Dataset Analyzed – Poll Results and Trends
- Jul 1, 2020.
The results show that despite the deluge of Big Data, large majority still works in Gigabyte or Megabyte-size datasets. Data Scientists work with the largest-size datasets, followed by Data Engineers, Data Analysts, and Business Analysts. Read more for details.
- How to Build Your Data Science Competency for Post-COVID Future
- Jul 1, 2020.
Data science is helping healthcare organizations and businesses navigate the current crisis more effectively. Find out how you can learn this in-demand qualification and help them with addressing complex challenges.
- Data Cleaning: The secret ingredient to the success of any Data Science Project
- Jul 1, 2020.
With an uncleaned dataset, no matter what type of algorithm you try, you will never get accurate results. That is why data scientists spend a considerable amount of time on data cleaning.