News / Blog
- Free Metis Corporate Training Series: Intro to Python - Apr 2, 2020.
Metis Corporate Training is offering Intro to Python, a free, live online training series specially created for business professionals, and an excellent way for a team to begin their Python journey. Classes are taught live, and participants will be able to ask questions in real time. Register now.
- A Layman’s Guide to Data Science. Part 2: How to Build a Data Project - Apr 2, 2020.
As Part 2 in a Guide to Data Science, we outline the steps to build your first Data Science project, including how to ask good questions to understand the data first, how to prepare the data, how to develop an MVP, reiterate to build a good product, and, finally, present your project.
- Python for data analysis… is it really that simple?!? - Apr 2, 2020.
The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations.
- Why you should NOT use MS MARCO to evaluate semantic search - Apr 2, 2020.
If we want to investigate the power and limitations of semantic vectors (pre-trained or not), we should ideally prioritize datasets that are less biased towards term-matching signals. This piece shows that the MS MARCO dataset is more biased towards those signals than we expected and that the same issues are likely present in many other datasets due to similar data collection designs.
- Top KDnuggets tweets, Mar 25-31: COVID-19 Visualized: The power of effective visualizations for pandemic story telling - Apr 1, 2020.
Also: 20 Historical Twitter Datasets Available for download #DataScience; How to Optimize Your Jupyter Notebook; SQL Cheat Sheet; How to learn #DataScience on your own: a practical guide
- Cartoon: AI understanding of Coronavirus - Apr 1, 2020.
Here is a cartoon to distract you, showing a new level of understanding AI could reach.
- I Don’t Believe in Electrons - Apr 1, 2020.
What does it mean to believe in science? Does this notion of belief even make sense, or are scientists just supposed to be skeptics that question everything for all time, until we somehow arrive at some notion of Truth? And, what is science, anyway?
- Introduction to the K-nearest Neighbour Algorithm Using Examples - Apr 1, 2020.
Read this concise summary of KNN, a supervised and pattern classification learning algorithm which helps us find which class the new input belongs to when k nearest neighbours are chosen and distance is calculated between them.
- Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs - Apr 1, 2020.
From network security to financial fraud, anomaly detection helps protect businesses, individuals, and online communities. To help improve anomaly detection, researchers have developed a new approach called MIDAS.
- KDnuggets™ News 20:n13, Apr 1: Effective visualizations for pandemic storytelling; Machine learning for time series forecasting - Apr 1, 2020.
This week, read about the power of effective visualizations for pandemic storytelling; see how (not) to use machine learning for time series forecasting; learn about a deep learning breakthrough: a sub-linear deep learning algorithm that does not need a GPU?; familiarize yourself with how to painlessly analyze your time series; check out what can we learn from the latest coronavirus trends; and... KDnuggets topics?!? Also, much more.
- Research into 1,001 Data Scientist LinkedIn Profiles, the latest - Mar 31, 2020.
What makes a data scientist today? Consider this review of data collected from three years worth of data scientist LinkedIn profiles to gain insight into how this important new career path is shaping up.
- Coronavirus Trends – what can we learn - Mar 31, 2020.
We examine the coronavirus trends, and look at death rates from Covid-19, including absolute numbers, adjusted for population, and daily change rates. The daily change rates are declining for almost all countries, including Italy and Spain, but remaining alarmingly high for US and especially New York State.
- 365 Data Science courses free until 15 April - Mar 31, 2020.
Be safe. Stay at home. Learn data science. 365 Data Science is making all of their courses free until Apr 15. Sign up now and learn for free!
- Microsoft Research Uses Transfer Learning to Train Real-World Autonomous Drones - Mar 31, 2020.
The new research uses policies learned in simulations in real world drone environments.
- How (not) to use Machine Learning for time series forecasting: The sequel - Mar 30, 2020.
Developing machine learning predictive models from time series data is an important skill in Data Science. While the time element in the data provides valuable information for your model, it can also lead you down a path that could fool you into something that isn't real. Follow this example to learn how to spot trouble in time series data before it's too late.
- KDnuggets Topics: bringing together the latest and the most popular - Mar 30, 2020.
To help our readers better navigate rich KDnuggets content, we introduce topic pages for most popular topics. Each topic page brings together most recent posts on that topic as well as most popular (badge-winning) previous posts.
- Brain Tumor Detection using Mask R-CNN - Mar 30, 2020.
Mask R-CNN has been the new state of the art in terms of instance segmentation. Here I want to share some simple understanding of it to give you a first look and then we can move ahead and build our model.
- Top Stories, Mar 23-29: 24 Best (and Free) Books To Understand Machine Learning; COVID-19 Visualized: The power of effective visualizations for pandemic storytelling - Mar 30, 2020.
Also: Coronavirus Data and Poll Analysis – yes, there is hope, if we act now; Making sense of ensemble learning techniques; Nine lessons learned during my first year as a Data Scientist; Deep Learning Breakthrough: a sub-linear deep learning algorithm that does not need a GPU?
- Advice for a Successful Data Science Career - Mar 30, 2020.
This blog is meant to show that most everyone has had to expend quite a bit of effort to get where they are. They have to work hard, sometimes experience failure, show discipline, be persistent, be dedicated to their goals, and sometimes sacrifice or take risks.
- SIGKDD Community Impact Program (Deadline Jun. 15) - Mar 27, 2020.
SIGKDD is announcing a funding opportunity through its Community Impact Program.The goal of the program is to support projects that promote data science and help the data science community to grow, broaden, and diversify. Read more here.
- Predicting the President: Two Ways Election Forecasts Are Misunderstood - Mar 27, 2020.
With election cycles always seeming to be in season, predictions on outcomes remain intriguing content for the voting citizens. Misinterpretation of election forecasts also runs rampant, and can impact perceptions of candidates and those who post these predictions. A better fundamental understanding of probability can help improve our collective notion of futurism, and how we monitor elections.
- COVID-19 Visualized: The power of effective visualizations for pandemic storytelling - Mar 27, 2020.
Clear, succinct data visualizations can be powerful tools for telling stories and explaining phenomena. This article demonstrates this concept as relates to the COVID-19 pandemic.
- Introduction to Kubeflow MPI Operator and Industry Adoption - Mar 27, 2020.
Kubeflow just announced its first major 1.0 release recently. This post introduces the MPI Operator, one of the core components of Kubeflow, currently in alpha, which makes it easy to run synchronized, allreduce-style distributed training on Kubernetes.
- Free online statistics course – Improve your analytics knowledge - Mar 26, 2020.
This online course is available – for free – to anyone interested in using data to solve problems better.
- Deep Learning Breakthrough: a sub-linear deep learning algorithm that does not need a GPU? - Mar 26, 2020.
Deep Learning sits at the forefront of many important advances underway in machine learning. With backpropagation being a primary training method, its computational inefficiencies require sophisticated hardware, such as GPUs. Learn about this recent breakthrough algorithmic advancement with improvements to the backpropgation calculations on a CPU that outperforms large neural network training with a GPU.
- How To Painlessly Analyze Your Time Series - Mar 26, 2020.
The Matrix Profile is a powerful tool to help solve this dual problem of anomaly detection and motif discovery. Matrix Profile is robust, scalable, and largely parameter-free: we’ve seen it work for a wide range of metrics including website user data, order volume and other business-critical applications.
- Making sense of ensemble learning techniques - Mar 26, 2020.
This article breaks down ensemble learning and how it can be used for problem solving.
- Top KDnuggets tweets, Mar 18-24: Advice to Data Scientists: don’t post a blog on #coronavirus based on your ad-hoc data analysis without reading something on epidemiology – here are some useful links - Mar 25, 2020.
#Coronavirus growth in Western countries: March 19 update - Spain and US cases; If you need a break from #coronavirus news, here is #AlphaGo - The Movie; Which Country Has Flattened the Curve for the #Coronavirus?; Excellent source of #Coronavirus info - FT
- Evaluating Ray: Distributed Python for Massive Scalability - Mar 25, 2020.
If your team has started using Ray and you’re wondering what it is, this post is for you. If you’re wondering if Ray should be part of your technical strategy for Python-based applications, especially ML and AI, this post is for you.
- Alternative Data, Text Analytics, and Sentiment Analysis in Trading and Investing - Mar 25, 2020.
Different types of data beyond your typical dollars and cents have been used in the finance industry for many years. By leveraging machine learning, sentiment data is expected to play an increasingly dominant role in the investment industry, and this article highlights some special challenges of its use in trading models.
- Want to Build an AI Model for Your Business? Read this - Mar 25, 2020.
The best approach for AI production is similar to what venture capitalists (VC’s) do when they evaluate and invest in startups.
- Diffusion Map for Manifold Learning, Theory and Implementation - Mar 25, 2020.
This article aims to introduce one of the manifold learning techniques called Diffusion Map. This technique enables us to understand the underlying geometric structure of high dimensional data as well as to reduce the dimensions, if required, by neatly capturing the non-linear relationships between the original dimensions.
- KDnuggets™ News 20:n12, Mar 25: 24 Best (and Free) Books To Understand Machine Learning; Coronavirus Daily Change and Poll Analysis; 9 lessons learned during 1st year as a Data Scientist - Mar 25, 2020.
Read our analysis of coronavirus data and poll results; Use your time indoors to learn with 24 best and free books to understand Machine Learning; Study the 9 important lessons from the first year as a Data Scientist; Understand the SVM, a top ML algorithm; check a comprehensive list of AI resources for online learning; and more.
- Top AI Resources – Directory for Remote Learning - Mar 24, 2020.
Whether you are just learning Data Science, a current professional, or just interested, it's crucial to keep the mind stimulated and stay current. With conferences, schools, and travel largely canceled because of #coronavirus, these remote resources will help you stay engaged.
- Why BERT Fails in Commercial Environments - Mar 24, 2020.
The deployment of large transformer-based models in dynamic commercial environments often yields poor results. This is because commercial environments are usually dynamic, and contain continuous domain shifts between inference and training data.
- Graph Neural Network model calibration for trusted predictions - Mar 24, 2020.
In this article, we’ll talk about calibration in graph machine learning, and how it can help to build trust in these powerful new models.
- How to Make Remote Work Effective for Data Science Teams - Mar 23, 2020.
This post aims to highlight some work from home best practices, both general and data science-specific, in order to help data scientists and teams remain productive, connected and happy while working remotely.
- Coronavirus Data and Poll Analysis – yes, there is hope, if we act now - Mar 23, 2020.
We examine the growth of coronavirus daily cases in most affected countries, and show evidence that social distancing works in reducing the rate of spread. We also analyze KDnuggets Poll results - the scale of change to online and how Data Science work is likely to increase or drop in different regions. Stay Healthy and practice social distancing!
- Top Stories, Mar 16-22: 24 Best (and Free) Books To Understand Machine Learning - Mar 23, 2020.
Also: Time Series Classification Synthetic vs Real Financial Time Series; Nine lessons learned during my first year as a Data Scientist; What is the most effective policy response to the new coronavirus pandemic?; Nine lessons learned during my first year as a Data Scientist; Five Interesting Data Engineering Projects
- Made With ML: Discover, build, and showcase machine learning projects - Mar 23, 2020.
This is a short introduction to Made With ML, a useful resource for machine learning engineers looking to get ideas for projects to build, and for those looking to share innovative portfolio projects once built.
- Exploring TensorFlow Quantum, Google’s New Framework for Creating Quantum Machine Learning Models - Mar 23, 2020.
TensorFlow Quantum allow data scientists to build machine learning models that work on quantum architectures.
- Top 20 ODSC 2020 Global Virtual Conference Sessions - Mar 20, 2020.
At ODSC 2020, we are unveiling our first ever 4-day Global Virtual Conference, an online and on-demand version of ODSC. Here are our picks for 20 talks that show how diverse and thorough the ODSC East Global Virtual Conference will be this April 14-17.
- Nine lessons learned during my first year as a Data Scientist - Mar 20, 2020.
What is it like to be a Data Scientist? There can be many hats to wear, and so many problems to solve that are fed with data, churned by data science, and guided by business results. Find out about lessons learned from one Data Scientist about how best to work and perform in the role.
- Build an Artificial Neural Network From Scratch: Part 2 - Mar 20, 2020.
The second article in this series focuses on building an Artificial Neural Network using the Numpy Python library.
- 24 Best (and Free) Books To Understand Machine Learning - Mar 20, 2020.
We have compiled a list of some of the best (and free) machine learning books that will prove helpful for everyone aspiring to build a career in the field.
- ModelDB 2.0 is here! - Mar 19, 2020.
We are excited to announce that ModelDB 2.0 is now available! We have learned a lot since building ModelDB 1.0, so we decided to rebuild from the ground up.
- A Comprehensive Data Repository for Fake Health News Detection - Mar 19, 2020.
We introduce the FakeHealth, a new data repository for fake health news detection. Following a preliminary analysis to demonstrate its features, we consider additional potential directions for better identifying fake news.
- What is the most effective policy response to the new coronavirus pandemic? - Mar 19, 2020.
Where Test/Trace/Quarantine are working, the number of cases/day have declined empirically. Furthermore, this appears to be a radically superior strategy where it can be deployed. I’ll review the evidence, discuss the other strategies and their consequences, and then discuss what can be done.
- The 4 Best Jupyter Notebook Environments for Deep Learning - Mar 19, 2020.
Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.
- Top KDnuggets tweets, Mar 11-17: Most western countries are on the same #coronavirus trajectory - Mar 18, 2020.
Most western countries are on the same #coronavirus trajectory; The Workers Who Face the Greatest #Coronavirus Risk; #Coronavirus, a Visual Rundown; How to start building an automated NLP solution for processing customer feedback
- Improving the partnership between Data Science and IT - Mar 18, 2020.
Friction can quickly arise as a result of these separate workflows and priorities. Given their differences, how can data science and IT more seamlessly work together in building a model-driven organization?
- Exploring the Adoption of Python in the Workplace – Free Metis Corporate Training Webinar - Mar 18, 2020.
Metis will break down Python for data science and analytics, explain what is driving adoption in the field, and discuss how industries and companies are reacting to the shift.
- A Top Machine Learning Algorithm Explained: Support Vector Machines (SVM) - Mar 18, 2020.
Support Vector Machines (SVMs) are powerful for solving regression and classification problems. You should have this approach in your machine learning arsenal, and this article provides all the mathematics you need to know -- it's not as hard you might think.
- Time Series Classification Synthetic vs Real Financial Time Series - Mar 18, 2020.
This article discusses distinguishing between real financial time series and synthetic time series using XGBoost.
- A Beginner’s Guide to Data Integration Approaches in Business Intelligence - Mar 18, 2020.
An integrated BI system has a trickle-down effect on all business processes, especially reporting and analytics. Find out how integration can help you leverage the power of BI.
- KDnuggets™ News 20:n11, Mar 18: Covid-19, your community, and you – a data science perspective; When Will AutoML replace Data Scientists? Poll Results and Analysis - Mar 18, 2020.
A Data Science perspective on Covid-19, the novel coronavirus; The results and analysis of a previous KDnuggets Poll: When Will AutoML replace Data Scientists? How to build a mature Machine Learning team; The Most Useful Machine Learning Tools of 2020; and more.
- Five Interesting Data Engineering Projects - Mar 17, 2020.
As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.
- Scaling Your Data Strategy - Mar 17, 2020.
This article presents a particular vision for a cohesive data strategy for addressing large-scale problems with data-driven solutions, based on prior professional experiences.
- Forecasting Stories: Is it seasonality or not? - Mar 17, 2020.
Kicking off with a series of forecasting stories, starting with seasonality and its business applications. This first article speaks of course corrections that were based on weather and calendar driven seasonality.
- Top Stories, Mar 9-15: New Poll: Coronavirus impact on Data Science community; Covid-19, your community, and you — a data science perspective - Mar 16, 2020.
Also: 50 Must-Read Free Books For Every Data Scientist in 2020; Decision Boundary for a Series of Machine Learning Models; 20 AI, Data Science, Machine Learning Terms You Need to Know in 2020 (Part 2)
- When Will AutoML replace Data Scientists? Poll Results and Analysis - Mar 16, 2020.
Will AI always be 5-10 years away? The majority of respondents to this poll think that AutoML will reach expert level in 5-10 years. Interestingly, it is about the same as 5 years ago. We examine the trends by AutoML experience, industry, and region.
- Skynet Is Real: The History and Future of Factories With No Workers - Mar 16, 2020.
Let’s see whether robots will become "grave diggers" of the proletariat, what do we lack to get total automation, and what compromises exist.
- Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia - Mar 16, 2020.
The framework uses a multi-hop QA method to answer complex questions by reasoning through Wikipedia’s datasets.
- Building a Mature Machine Learning Team - Mar 13, 2020.
After spending a lot of time thinking about the paths that software companies take toward ML maturity, this framework was created to follow as you adopt ML and then mature as an organization. The framework covers every aspect of building a team including product, process, technical, and organizational readiness, as well as recognizes the importance of cross-functional expertise and process improvements for bringing AI-driven products to market.
- The Most Useful Machine Learning Tools of 2020 - Mar 13, 2020.
This articles outlines 5 sets of tools every lazy full-stack data scientist should use.
- Decision Boundary for a Series of Machine Learning Models - Mar 13, 2020.
I train a series of Machine Learning models using the iris dataset, construct synthetic data from the extreme points within the data and test a number of Machine Learning models in order to draw the decision boundaries from which the models make predictions in a 2D space, which is useful for illustrative purposes and understanding on how different Machine Learning models make predictions.
- How To Build Your Own Feedback Analysis Solution - Mar 12, 2020.
Automating the analysis of customer feedback will sound like a great idea after reading a couple hundred reviews. Building an NLP solution to provide in-depth analysis of what your customers are thinking is a serious undertaking, and this guide helps you scope out the entire project.
- Few-Shot Image Classification with Meta-Learning - Mar 12, 2020.
Here is how you can teach your model to learn quickly from a few examples.
- Google Open Sources TFCO to Help Build Fair Machine Learning Models - Mar 12, 2020.
A new optimization framework helps to incorporate fairness constraints in machine learning models.
- Top KDnuggets tweets, Mar 04-10: 10 Free Must-Read Books for Machine Learning and Data Science - Mar 11, 2020.
Also: The three phases of #COVID19 – and how we can make it manageable; 50 Must-Read Free Books For Every Data Scientist in 2020; Binary classification is a core machine learning technique, but is there a better way to evaluate its performance than ROC-AUC?
- Math for Programmers! - Mar 11, 2020.
Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer.
- Software Interfaces for Machine Learning Deployment - Mar 11, 2020.
While building a machine learning model might be the fun part, it won't do much for anyone else unless it can be deployed into a production environment. How to implement machine learning deployments is a special challenge with differences from traditional software engineering, and this post examines a fundamental first step -- how to create software interfaces so you can develop deployments that are automated and repeatable.
- Covid-19, your community, and you — a data science perspective - Mar 11, 2020.
Let's talk about covid-19; the reality, the numbers, and the data science.
- KDnuggets™ News 20:n10, Mar 11: What impact is the coronavirus having on the AI/Data Science/Machine Learning community?; Recreating Fingerprints using Convolutional Autoencoders - Mar 11, 2020.
Also: Recreating Fingerprints using Convolutional Autoencoders; A simple and interpretable performance measure for a binary classifier; Resources for Women in AI, Data Science, and Machine Learning; Trends in Machine Learning in 2020; A Crash Course in Game Theory for Machine Learning; and much more
- New Poll: Coronavirus impact on AI/Data Science/Machine Learning community - Mar 10, 2020.
Has coronavirus impacted your conference or other travel plans, and do you anticipate it causing further professional or educational disruption in the near future? Take part in the new KDnuggets poll and have your say.
- Domino named a Visionary in Gartner Magic Quadrant for completeness of vision and ability to execute - Mar 10, 2020.
From a product perspective, we believe three aspects of the Domino platform, in particular, are foundational to earning this illustrious moniker: openness, collaboration, and reproducibility.
- Python Pandas For Data Discovery in 7 Simple Steps - Mar 10, 2020.
Just getting started with Python's Pandas library for data analysis? Or, ready for a quick refresher? These 7 steps will help you become familiar with its core features so you can begin exploring your data in no time.
- The Berlin Rent Freeze: How many illegal overpriced offers can I find online? - Mar 10, 2020.
This post presents an analysis of Berlin online real estate listings, investigating a controversial law capping rents in the state, which went into effect on February 23. Are current landlords already respecting the new rent cap?
- Generate Realistic Human Face using GAN - Mar 10, 2020.
This article contain a brief intro to Generative Adversarial Network(GAN) and how to build a Human Face Generator.
- Unlocking the Potential of FAIR Data Using AI at Roche - Mar 9, 2020.
Learn from the head of the data science department in research and early development at Roche. Use the code KDNUGGETS for a 15% discount on your Predictive Analytics World ticket, 11-12 May in Munich.
- 20+ Machine Learning Datasets & Project Ideas - Mar 9, 2020.
Upgrading your machine learning, AI, and Data Science skills requires practice. To practice, you need to develop models with a large amount of data. Finding good datasets to work with can be challenging, so this article discusses more than 20 great datasets along with machine learning project ideas for you to tackle today.
- Top Stories, Mar 2-8: 20 AI, Data Science, Machine Learning Terms You Need to Know in 2020 (Part 2) - Mar 9, 2020.
Also: Linear to Logistic Regression, Explained Step by Step; Trends in Machine Learning in 2020; Tokenization and Text Data Preparation with TensorFlow & Keras; The Death of Data Scientists — will AutoML replace them?
- 50 Must-Read Free Books For Every Data Scientist in 2020 - Mar 9, 2020.
In this article, we are listing down some excellent data science books which cover the wide variety of topics under Data Science.
- A Crash Course in Game Theory for Machine Learning: Classic and New Ideas - Mar 9, 2020.
Game theory is experiencing a renaissance driven by the evolution of AI. What are some classic and new ideas that data scientists should be aware of.
- Resources for Women in AI, Data Science, and Machine Learning - Mar 8, 2020.
For the international women's day, we feature resources to help more women enter and succeed in AI, Big Data, Data Science, and Machine Learning fields.
- Top February Stories: The Death of Data Scientists – will AutoML replace them? - Mar 6, 2020.
Also: Learning from 3 big Data Science career mistakes; Leaders, Changes, and Trends in Gartner 2020 MQ Data Science and Machine Learning Platforms; Why Did I Reject a Data Scientist Job; Free Mathematics Courses for Data Science & Machine Learning.
- Analyzing GDPR Fines – who are largest violators? - Mar 6, 2020.
Fines from the GDPR have been rolling in since its inception in 2018. This article investigates who are the largest penalty recipients by country, the amounts, and private individuals.
- Tokenization and Text Data Preparation with TensorFlow & Keras - Mar 6, 2020.
This article will look at tokenizing and further preparing text data for feeding into a neural network using TensorFlow and Keras preprocessing tools.
- Phishytics – Machine Learning for Detecting Phishing Websites - Mar 6, 2020.
Since phishing is such a widespread problem in the cybersecurity domain, let us take a look at the application of machine learning for phishing website detection.
- Trends in Machine Learning in 2020 - Mar 5, 2020.
Many industries realize the potential of Machine Learning and are incorporating it as a core technology. Progress and new applications of these tools are moving quickly in the field, and we discuss expected upcoming trends in Machine Learning for 2020.