2020 Apr
All (89) | Events (6) | News, Education (6) | Opinions (14) | Top Stories, Tweets (10) | Tutorials, Overviews (53)
- KDD 2020 Invites Top Data Scientists To Compete in 24th Annual KDD Cup - Apr 30, 2020.
This year's KDD Cup features four distinct tracks that welcome participants to tackle challenges in e-commerce, generative adversarial networks, automatic graph representation learning (AutoGraph) and mobility-on-demand (MoD) platforms. Winners will be recognized at KDD 2020, the leading interdisciplinary conference in data science, in San Diego on August 23-27, 2020.
- Outbreak Analytics: Data Science Strategies for a Novel Problem - Apr 30, 2020.
You walk down one aisle of the grocery store to get your favorite cereal. On the dairy aisle, someone sick from COVID-19 coughs. Did your decision to grab your cereal before your milk possibly keep you healthy? How can these unpredictable, near-random choices be included in complex models?
- Exploring the Impact of Geographic Information Systems - Apr 30, 2020.
GIS has mostly been behind more popular buzzwords like machine learning and deep learning. GIS has always been around us in the background being used in government, business, medicine, real estate, transport, manufacturing etc.
-
Five Cool Python Libraries for Data Science - Apr 30, 2020.
Check out these 5 cool Python libraries that the author has come across during an NLP project, and which have made their life easier. - Top KDnuggets tweets, Apr 22-28: 24 Best (and Free) Books To Understand Machine Learning - Apr 29, 2020.
Also: A Concise Course in Statistical Inference: The Free eBook; ML Ops: Machine Learning as an Engineering Discipline; Learning during a crisis (#DataScience 90-day learning challenge) ; Free High-Quality Machine Learning & Data Science Books & Courses: Quarantine Edition
- Introducing Brain Simulator II: A New Platform for AGI Experimentation - Apr 29, 2020.
A growing consensus of researchers contend that new algorithms are needed to transform narrow AI to AGI. Brain Simulator II is free software for new algorithm development targeted at AGI that you can experiment with and participate in its development.
- Understanding the COVID-19 Pandemic Using Interactive Visualizations - Apr 29, 2020.
Interactive visualizations are an effective method for understanding the COVID-19 pandemic. This article presents a repository filled with just such insightful interactions.
-
Coronavirus COVID-19 Genome Analysis using Biopython - Apr 29, 2020.
So in this article, we will interpret, analyze the COVID-19 DNA sequence data and try to get as many insights regarding the proteins that made it up. Later will compare COVID-19 DNA with MERS and SARS and we’ll understand the relationship among them. - Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing - Apr 28, 2020.
The book Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing by Ron Kohavi (Microsoft, Airbnb), Diane Tang (Google) and Ya Xu (LinkedIn) is available for purchase, with the authors proceeds from the book being donated to charity.
- How Data Scientists Can Train and Updates Models to Prepare for COVID-19 Recovery - Apr 28, 2020.
The COVID-19 pandemic has affected everything, and building predictions during this time is difficult. Data science teams need to update their models to prepare for the recovery, and know how to properly train 2020 data models to learn from the coronavirus anomaly.
- How AI Can Help Manage Infectious Diseases - Apr 28, 2020.
With the capability to analyze huge amounts of data, including medical information, human behavior patterns, and environmental conditions, big data tools can be invaluable in dealing with deadly outbreaks.
- 10 Best Machine Learning Textbooks that All Data Scientists Should Read, by Daniel Smith - Apr 28, 2020.
Check out these 10 books that can help data scientists and aspiring data scientists learn machine learning today.
- LSTM for time series prediction - Apr 27, 2020.
Learn how to develop a LSTM neural network with PyTorch on trading data to predict future prices by mimicking actual values of the time series data.
-
A Concise Course in Statistical Inference: The Free eBook - Apr 27, 2020.
Check out this freely available book, All of Statistics: A Concise Course in Statistical Inference, and learn the probability and statistics needed for success in data science. - Top Stories, Apr 20-26: The Super Duper NLP Repo; Free High-Quality Machine Learning & Data Science Books & Courses - Apr 27, 2020.
Also: Should Data Scientists Model COVID19 and other Biological Events; 5 Papers on CNNs Every Data Scientist Should Read; 24 Best (and Free) Books To Understand Machine Learning; Mathematics for Machine Learning: The Free eBook; Find Your Perfect Fit: A Quick Guide for Job Roles in the Data World
- Google Open Sources SimCLR, A Framework for Self-Supervised and Semi-Supervised Image Training - Apr 27, 2020.
The new framework uses contrastive learning to improve image analysis in unlabeled datasets.
- Learning during a crisis (Data Science 90-day learning challenge) - Apr 24, 2020.
How can you keep your focus and drive during a global crisis? Take on a 90-day learning challenge for data science and check out this list of books and courses to follow.
-
The Super Duper NLP Repo: 100 Ready-to-Run Colab Notebooks - Apr 24, 2020.
Check out this repository of more than 100 freely-accessible NLP notebooks, curated from around the internet, and ready to launch in Colab with a single click. - Data Transformation: Standardization vs Normalization, by Clare Liu - Apr 23, 2020.
Increasing accuracy in your models is often obtained through the first steps of data transformations. This guide explains the difference between the key feature scaling methods of standardization and normalization, and demonstrates when and how to apply each approach.
- 3 Reasons Why We Are Far From Achieving Artificial General Intelligence - Apr 23, 2020.
How far we are from achieving Artificial General Intelligence? We answer this through the study of three limitations of current machine learning.
- Find Your Perfect Fit: A Quick Guide for Job Roles in the Data World - Apr 23, 2020.
Data related positions are considered the hottest in the job market during the last couple of years. While everyone wants to join the party and enter this fascinating field, it is essential to first get an understanding. In this quick guide, I’ll do my best to dispel the confusion by crystalizing the essence of the different positions.
- Top KDnuggets tweets, Apr 15-21: 21 Techniques to Write Better #Python Code with #PyCharm examples - Apr 22, 2020.
Also: Math for Programmers!; If #Programming languages had honest slogans #humor; 5 Papers on CNNs Every Data Scientist Should Read; Why Understanding CVEs Is Critical for Data Scientists
- Data context and how to get started with understanding COVID-19 data - Apr 22, 2020.
If you are already applying your Data Science skills or getting ready to contribute to analyzing COVID-19 data, then be sure to take sufficient time to appreciate the context of the numbers to focus on what's most important as we collaborate on this global battle.
-
Should Data Scientists Model COVID19 and other Biological Events - Apr 22, 2020.
Biostatisticians use statistical techniques that your current everyday data scientists have probably never heard of. This is a great example where lack of domain knowledge exposes you as someone that does not know what they are doing and are merely hopping on a trend. - Fighting Coronavirus With AI: Improving Testing with Deep Learning and Computer Vision - Apr 22, 2020.
This post will cover how testing is done for the coronavirus, why it's important in battling the pandemic, and how deep learning tools for medical imaging can help us improve the quality of COVID-19 testing.
- Math and Architectures of Deep Learning - Apr 22, 2020.
This hands-on book bridges the gap between theory and practice, showing you the math of deep learning algorithms side by side with an implementation in PyTorch. You can save 40% off Math and Architectures of Deep Learning until May 13! Just enter the code nlkdarch40 at checkout when you buy from manning.com.
-
Free High-Quality Machine Learning & Data Science Books & Courses: Quarantine Edition - Apr 22, 2020.
If you find yourself quarantined and looking for free learning materials in the way of books and courses to sharpen your data science and machine learning skills, this collection of articles I have previously written curating such things is for you. - Fast Track Your Data Science Career - Apr 21, 2020.
Earn a Master of Professional Studies in Data Analytics online through Penn State World Campus – and you can add in-demand skills to your wheelhouse while you continue to work.
- 4 Realistic Career Options for Data Scientists - Apr 21, 2020.
It’s almost 10 years since "Data Science" became mainstream. We ask less about how to get into Data Science, but wonder "what’s next?" This article includes insights on four non-trivial, but practical, options and their pitfalls.
- Announcing PyCaret 1.0.0 - Apr 21, 2020.
An open source low-code machine learning library in Python. PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient.
- The Benefits & Examples of Using Apache Spark with PySpark - Apr 21, 2020.
Apache Spark runs fast, offers robust, distributed, fault-tolerant data objects, and integrates beautifully with the world of machine learning and graph analytics. Learn more here.
- Livestream Deep Learning World from your Home Office! - Apr 20, 2020.
Livestream Deep Learning World Munich 2020 from the comfort and safety of your home on 11-12 May 2020.
- A Key Missing Part of the Machine Learning Stack - Apr 20, 2020.
With many organizations having machine learning models running in production, some are discovering that inefficiencies exists in the first step of the process: feature definition and extraction. Robust feature management is now being realized as a key missing part of the ML stack, and improving it by applying standard software development practices is gaining attention.
- Top Stories, Apr 13-19: Can Java Be Used for Machine Learning and Data Science?; How Deep Learning is Accelerating Drug Discovery in Pharmaceuticals - Apr 20, 2020.
Also: Peer Reviewing Data Science Projects; Visualizing Decision Trees with Python (Scikit-learn, Graphviz, Matplotlib); Can Java Be Used for Machine Learning and Data Science?; Mathematics for Machine Learning: The Free eBook; 24 Best (and Free) Books To Understand Machine Learning
-
5 Papers on CNNs Every Data Scientist Should Read - Apr 20, 2020.
In this article, we introduce 5 papers on CNNs that represent both novel approaches and baselines in the field. - The Double Descent Hypothesis: How Bigger Models and More Data Can Hurt Performance - Apr 20, 2020.
OpenAI research shows a phenomenon that challenges both traditional statistical learning theory and conventional wisdom in machine learning practitioners.
- 4 Steps to ensure your AI/Machine Learning system survives COVID-19 - Apr 17, 2020.
Many AI models rely on historical data to make predictions on future behavior. So, what happens when consumer behavior across the planet makes a 180 degree flip? Companies are quickly seeing less value from some AI systems as training data is no longer relevant when user behaviors and preferences change so drastically. Those who are flexible can make it through this crisis in data, and these four techniques will help you stay in front of the competition.
- Dockerize Jupyter with the Visual Debugger - Apr 17, 2020.
A step by step guide to enable and use visual debugging in Jupyter in a docker container.
- OpenAI Open Sources Microscope and the Lucid Library to Visualize Neurons in Deep Neural Networks - Apr 17, 2020.
The new tools shows the potential of data visualizations for understanding features in a neural network.
- State of the Machine Learning and AI Industry - Apr 16, 2020.
Enterprises are struggling to launch machine learning models that encapsulate the optimization of business processes. These are now the essential components of data-driven applications and AI services that can improve legacy rule-based business processes, increase productivity, and deliver results. In the current state of the industry, many companies are turning to off-the-shelf platforms to increase expectations for success in applying machine learning.
- Dive Into Deep Learning: The Free eBook - Apr 16, 2020.
This freely available text on deep learning is fully interactive and incredibly thorough. Check out "Dive Into Deep Learning" now and increase your neural networks theoretical understanding and practical implementation skills.
- Better notebooks through CI: automatically testing documentation for graph machine learning - Apr 16, 2020.
In this article, we’ll walk through the detailed and helpful continuous integration (CI) that supports us in keeping StellarGraph’s demos current and informative.
- Top KDnuggets tweets, Apr 08-14: Mathematics for #MachineLearning: The Free eBook – KDnuggets - Apr 15, 2020.
Also Exploratory Data Analysis for Natural Language Processing: A Complete Guide to Python Tools; A professor with 20 year experience to all high school seniors (and their parents). If you were planning to enroll in college next fall - don't.
- Pandas in action - Apr 15, 2020.
Pandas is instantly familiar to anyone who’s used spreadsheet software, whether that’s Google Sheets or good old Excel. It’s got columns, it’s got grids, it’s got rows; but pandas is far more powerful. Save 40% with code nlkdpandas40 on this book, and other Manning books and videos.
- Why and How to Use Dask with Big Data - Apr 15, 2020.
The Pandas library for Python is a game-changer for data preparation. But, when the data gets big, really big, then your computer needs more help to efficiency handle all that data. Learn more about how to use Dask and follow a demo to scale up your Pandas to work with Big Data.
- Federated Learning: An Introduction - Apr 15, 2020.
Improving machine learning models and making them more secure by training on decentralized data.
- Visualizing Decision Trees with Python (Scikit-learn, Graphviz, Matplotlib) - Apr 15, 2020.
Learn about how to visualize decision trees using matplotlib and Graphviz.
- Top Process Mining Software Companies, Updated - Apr 14, 2020.
Understanding the real business processes of a company through analysis of its information systems can guide digital transformations. Here, the top 10 process mining software companies are reviewed that can assist businesses in process optimizations through unique insights of business systems.
-
Can Java Be Used for Machine Learning and Data Science? - Apr 14, 2020.
While Python and R have become favorites for building these programs, many organizations are turning to Java application development to meet their needs. Read on to see how, and why. - Free Metis Corporate Training Series: Intro to Python, Continued - Apr 14, 2020.
Metis Corporate Training is offering Intro to Python, a free, live online training series specially created for business professionals, and an excellent way for a team to begin their Python journey. Classes are taught live, and participants will be able to ask questions in real time. Register now.
- Forecasting Stories 2: The Power of a Seasonality Index - Apr 14, 2020.
Read this second entry in a series on time series analysis and seasonality, and see how, through 2 simple use cases, the power of a seasonality index is uncovered.
- Free Workshop Preview: Data Thinking with Martin Szugat - Apr 13, 2020.
As anticipation grows for Predictive Analytics World’s virtual conferences (PAW for Industry 4.0, PAW for Healthcare and Deep Learning World on 11-12 May 2020) and virtual workshops (13 May 2020), here is a chance to start familiarising yourself with the quality of the content and of the virtual networking. Gain an insight into how to apply design thinking for data science & analytics. Reserve your spot.
-
Peer Reviewing Data Science Projects - Apr 13, 2020.
In any technical development field, having other practitioners review your work before shipping code off to production is a valuable support tool to make sure your work is error-proof. Even through your preparation for the review, improvements might be discovered and then other issues that escaped your awareness can be spotted by outsiders. This peer scrutiny can also be applied to Data Science, and this article outlines a process that you can experiment with in your team. - Top Stories, Apr 6-12: Mathematics for Machine Learning: The Free eBook; 10 Must-read Machine Learning Articles (March 2020) - Apr 13, 2020.
Also: Top KDnuggets tweets, Apr 01-07: How to change global policy on #coronavirus; 5 Ways Data Scientists Can Help Respond to COVID-19 and 5 Actions to Avoid; How to Do Hyperparameter Tuning on Any Python Script in 3 Easy Steps; COVID-19 Visualized: The power of effective visualizations for pandemic storytelling
-
How Deep Learning is Accelerating Drug Discovery in Pharmaceuticals - Apr 13, 2020.
The goal of this essay is to discuss meaningful machine learning progress in the real-world application of drug discovery. There’s even a solid chance of the deep learning approach to drug discovery changing lives for the better doing meaningful good in the world. - DeepMind Unveils Agent57, the First AI Agents that Outperforms Human Benchmarks in 57 Atari Games - Apr 13, 2020.
The new reinforcement learning agent innovates over previous architectures achieving one of the most important milestones in the AI space.
- KNIME Spring Summit Online Edition - Apr 10, 2020.
The KNIME Summits, in spring and fall, have been taking place since 2008 in Europe and the US. In light of the coronavirus, this year’s KNIME Spring Summit moved online. Not too late to participate: KNIME Spring Summit continues online. Check out the extended summit program now.
- Upcoming Webinars and online events in AI, Data Science, Machine Learning - Apr 10, 2020.
Use the time at home productively and learn something new! We bring you a selection of upcoming interesting webinars and online events on AI, Data Science, Machine Learning, and related topics.
- Has AI Come Full Circle? A data science journey, or why I accepted a data science job - Apr 10, 2020.
Personal journeys in Data Science can vary greatly between individuals. Some are just getting starting and wading into this vast ocean of opportunity, and others have been involved during its decades-long evolution as a professional field. This review of a longer journey can provide a broader perspective of how you might fit into this interesting career.
- Successful Use Cases of Artificial Intelligence for Businesses - Apr 10, 2020.
AI is contributing to the businesses in a huge way. For specifics, check out these successful use cases of AI for business.
- How Data Science Is Being Used to Understand COVID-19 - Apr 10, 2020.
Read this overview to gain an understanding of how data scientists are working hard to learn as much about COVID-19 as they can.
- Statistical Thinking for Industrial Problem Solving – a free online statistics course - Apr 9, 2020.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
- 3 Best Sites to Find Datasets for your Data Science Projects - Apr 9, 2020.
When first learning data science, you will inevitably find yourself looking for more datasets to practice with. Here, we recommend the 3 best sites to find datasets to spark your next data science project.
- Build PyTorch Models Easily Using torchlayers - Apr 9, 2020.
torchlayers aims to do what Keras did for TensorFlow, providing a higher-level model-building API and some handy defaults and add-ons useful for crafting PyTorch neural networks.
- Top March stories: 24 Best (and Free) Books To Understand Machine Learning; COVID-19 Visualized: The power of effective visualizations; 20 AI, DS, ML terms you need to know - Apr 9, 2020.
Also: 20 AI, Data Science, Machine Learning Terms You Need to Know in 2020 (Part 2); Linear to Logistic Regression, Explained Step by Step.
-
10 Must-read Machine Learning Articles (March 2020) - Apr 9, 2020.
This list will feature some of the recent work and discoveries happening in machine learning, as well as guides and resources for both beginner and intermediate data scientists. -
Top KDnuggets tweets, Apr 01-07: How to change global policy on #coronavirus - Apr 8, 2020.
Also: 10 Must-read Machine Learning Articles (March 2020); Mathematics for Machine Learning: The Free eBook; Free Mathematics Courses for Data Science & Machine Learning; 9 Best YouTube Playlists and Videos — #Python for #MachineLearning -
How to Do Hyperparameter Tuning on Any Python Script in 3 Easy Steps - Apr 8, 2020.
With your machine learning model in Python just working, it's time to optimize it for performance. Follow this guide to setup automated tuning using any optimization library in three steps. - TensorFlow Dev Summit 2020: Top 10 Tricks for TensorFlow and Google Colab Users - Apr 8, 2020.
In this piece, we’ll highlight some of the tips and tricks mentioned during this year’s TF summit. Specifically, these tips will help you in getting the best out of Google’s Colab.
- 3 Reasons to Use Random Forest® Over a Neural Network: Comparing Machine Learning versus Deep Learning - Apr 8, 2020.
Both the random forest algorithm and Neural Networks are different techniques that learn differently but can be used in similar domains. Why would you use one over the other?
- 2 Things You Need to Know about Reinforcement Learning – Computational Efficiency and Sample Efficiency - Apr 7, 2020.
Experimenting with different strategies for a reinforcement learning model is crucial to discovering the best approach for your application. However, where you land can have significant impact on your system's energy consumption that could cause you to think again about the efficiency of your computations.
- Simple Question Answering (QA) Systems That Use Text Similarity Detection in Python - Apr 7, 2020.
How exactly are smart algorithms able to engage and communicate with us like humans? The answer lies in Question Answering systems that are built on a foundation of Machine Learning and Natural Language Processing. Let's build one here.
- Build an app to generate photorealistic faces using TensorFlow and Streamlit - Apr 7, 2020.
We’ll show you how to quickly build a Streamlit app to synthesize celebrity faces using GANs, Tensorflow, and st.cache.
- 5 Ways Data Scientists Can Help Respond to COVID-19 and 5 Actions to Avoid - Apr 6, 2020.
How can data scientists help with the COVID-19 response within their organization and more broadly? While there are many valuable and interesting opportunities to apply your skills, there can be unintended consequences even from your best attempt. So, consider this general advice for data scientists who want to help with this and any disaster response.
- Uber Open Sourced Fiber, a Framework to Streamline Distributed Computing for Reinforcement Learning Models - Apr 6, 2020.
The new framework simplifies distributed and scalable training for reinforcement learning agents.
- Top Stories, Mar 30 – Apr 5: COVID-19 Visualized: The power of effective visualizations for pandemic storytelling; Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs - Apr 6, 2020.
Also: How (not) to use Machine Learning for time series forecasting: The sequel; Best Free Epidemiology Courses for Data Scientists; Research into 1,001 Data Scientist LinkedIn Profiles, the latest; How (not) to use Machine Learning for time series forecasting: The sequel; Stop Hurting Your Pandas!
-
Mathematics for Machine Learning: The Free eBook - Apr 6, 2020.
Check out this free ebook covering the fundamentals of mathematics for machine learning, as well as its companion website of exercises and Jupyter notebooks. - More Performance Evaluation Metrics for Classification Problems You Should Know - Apr 3, 2020.
When building and optimizing your classification model, measuring how accurately it predicts your expected outcome is crucial. However, this metric alone is never the entire story, as it can still offer misleading results. That's where these additional performance evaluations come into play to help tease out more meaning from your model.
-
Best Free Epidemiology Courses for Data Scientists - Apr 3, 2020.
Are you interested in knowing more about epidemiology, the field which studies the spread and distribution of diseases? This article collects some free courses which are intended to help you do just that. -
Stop Hurting Your Pandas! - Apr 3, 2020.
This post will address the issues that can arise when Pandas slicing is used improperly. If you see the warning that reads "A value is trying to be set on a copy of a slice from a DataFrame", this post is for you. - Free Metis Corporate Training Series: Intro to Python - Apr 2, 2020.
Metis Corporate Training is offering Intro to Python, a free, live online training series specially created for business professionals, and an excellent way for a team to begin their Python journey. Classes are taught live, and participants will be able to ask questions in real time. Register now.
- A Layman’s Guide to Data Science. Part 2: How to Build a Data Project - Apr 2, 2020.
As Part 2 in a Guide to Data Science, we outline the steps to build your first Data Science project, including how to ask good questions to understand the data first, how to prepare the data, how to develop an MVP, reiterate to build a good product, and, finally, present your project.
-
Python for data analysis… is it really that simple?!? - Apr 2, 2020.
The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations. - Why you should NOT use MS MARCO to evaluate semantic search - Apr 2, 2020.
If we want to investigate the power and limitations of semantic vectors (pre-trained or not), we should ideally prioritize datasets that are less biased towards term-matching signals. This piece shows that the MS MARCO dataset is more biased towards those signals than we expected and that the same issues are likely present in many other datasets due to similar data collection designs.
- Top KDnuggets tweets, Mar 25-31: COVID-19 Visualized: The power of effective visualizations for pandemic story telling - Apr 1, 2020.
Also: 20 Historical Twitter Datasets Available for download #DataScience; How to Optimize Your Jupyter Notebook; SQL Cheat Sheet; How to learn #DataScience on your own: a practical guide
- Cartoon: AI understanding of Coronavirus - Apr 1, 2020.
Here is a cartoon to distract you, showing a new level of understanding AI could reach.
- I Don’t Believe in Electrons - Apr 1, 2020.
What does it mean to believe in science? Does this notion of belief even make sense, or are scientists just supposed to be skeptics that question everything for all time, until we somehow arrive at some notion of Truth? And, what is science, anyway?
- Introduction to the K-nearest Neighbour Algorithm Using Examples - Apr 1, 2020.
Read this concise summary of KNN, a supervised and pattern classification learning algorithm which helps us find which class the new input belongs to when k nearest neighbours are chosen and distance is calculated between them.
-
Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs - Apr 1, 2020.
From network security to financial fraud, anomaly detection helps protect businesses, individuals, and online communities. To help improve anomaly detection, researchers have developed a new approach called MIDAS.