2020 May
All (84) | Events (2) | News, Education (7) | Opinions (17) | Top Stories, Tweets (9) | Tutorials, Overviews (49)
- KDD-2020 – Virtual Only Conference, Aug 23-27 - May 29, 2020.
After much consideration, the General Chairs, Executive Committee and Organizing Committee for KDD 2020 have decided to take the conference fully virtual. Clear your calendar for August 23-27, 2020, and enjoy access to all the virtual content live and on demand the week of the event.
- Six Ways For Data Scientists to Succeed at a Startup - May 29, 2020.
Here are six tips that will help data scientists and other data professionals succeed if they have the opportunity to work at a startup.
- Privacy-preserving AI – Why do we need it? - May 29, 2020.
Various data privacy threats can result from the usual process of building and constructing data and AI-based systems. Avoiding these challenges can be supported by utilizing state-of-the-art technologies in the domain of privacy-preserving AI.
-
How to Think Like a Data Scientist - May 29, 2020.
So what does it take to become a data scientist? For some pointers on the skills for success, I interviewed Ben Chu, who is a Senior Data Scientist at Refinitiv Labs. - Data Extraction Software Octoparse 8 vs Octoparse 7: What’s New - May 28, 2020.
Octoparse 8 was recently released. Get a better understanding of what the differences between OP 8 and 7 are by reading this overview.
-
Model Evaluation Metrics in Machine Learning - May 28, 2020.
A detailed explanation of model evaluation metrics to evaluate a classification machine learning model. - Taming Complexity in MLOps - May 28, 2020.
A greatly expanded v2.0 of the open-source Orbyter toolkit helps data science teams continue to streamline machine learning delivery pipelines, with an emphasis on seamless deployment to production.
- 5 Machine Learning Papers on Face Recognition - May 28, 2020.
This article will highlight some of that research and introduce five machine learning papers on face recognition.
- Top KDnuggets tweets, May 20-26: The Best NLP with Deep Learning Course is Free - May 27, 2020.
Also: #Linearalgebra and optimization and machine learning: A textbook; Everything you need to become a self-taught #MachineLearning Engineer ; #SQL Cheat Sheet (2020); Automated Machine Learning: The Free eBook - KDnuggets
- Are Tera Operations Per Second (TOPS) Just hype? Or Dark AI Silicon in Disguise? - May 27, 2020.
This article explains why TOPS isn’t as accurate a gauge as many people think, and discusses other criteria that should be considered when evaluating a solution to a real application.
- Deepmind’s Gaming Streak: The Rise of AI Dominance - May 27, 2020.
There is still a long way to go before machine agents match overall human gaming prowess, but Deepmind’s gaming research focus has shown a clear progression of substantial progress.
- Best GIS Courses in 2020 - May 27, 2020.
Geographic Information Systems Analysis is the analysis of spatial relationships and patterns. Spatial components are being ingrained into society with the advent of the Internet of Things (IoT) in which more data can be connected and is likely to have a spatio-temporal component as well.
- Faster machine learning on larger graphs with NumPy and Pandas - May 27, 2020.
One of the most exciting features of StellarGraph 1.0 is a new graph data structure — built using NumPy and Pandas — that results in significantly lower memory usage and faster construction times.
- How to Rock a Virtual Data Interview - May 26, 2020.
To help you truly rock your next virtual data interview, we’ve pulled together a few tips that we recommend when conducting our online interviews for The Data Incubator’s Data Science Fellowship Program.
- Dataset Splitting Best Practices in Python - May 26, 2020.
If you are splitting your dataset into training and testing data you need to keep some things in mind. This discussion of 3 best practices to keep in mind when doing so includes demonstration of how to implement these particular considerations in Python.
- Interactive Machine Learning Experiments - May 26, 2020.
Dive into experimenting with machine learning techniques using this open-source collection of interactive demos built on multilayer perceptrons, convolutional neural networks, and recurrent neural networks. Each package consists of ready-to-try web browser interfaces and fully-developed notebooks for you to fine tune the training for better performance.
- Machine Fairness: How to assess AI system’s fairness and mitigate any observed unfairness issues - May 26, 2020.
Microsoft is bringing the latest research in responsible AI to Azure (both Azure Machine Learning and their open source toolkits), to empower data scientists and developers to understand machine learning models, protect people and their data, and control the end-to-end machine learning process.
- LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability - May 25, 2020.
Spark-TFRecord enables the processing of TensorFlow’s TFRecord structures in Apache Spark.
- 10 Useful Machine Learning Practices For Python Developers - May 25, 2020.
While you may be a data scientist, you are still a developer at the core. This means your code should be skillful. Follow these 10 tips to make sure you quickly deliver bug-free machine learning solutions.
-
Python For Everybody: The Free eBook - May 25, 2020.
Get back to fundamentals with this free eBook, Python For Everybody, approaching the learning of programming from a data analysis perspective. - Top Stories, May 18-24: The Best NLP with Deep Learning Course is Free - May 25, 2020.
Also: Automated Machine Learning: The Free eBook; Sparse Matrix Representation in Python; Build and deploy your first machine learning web app; Complex logic at breakneck speed: Try Julia for data science
-
The Best NLP with Deep Learning Course is Free - May 22, 2020.
Stanford's Natural Language Processing with Deep Learning is one of the most respected courses on the topic that you will find anywhere, and the course materials are freely available online. - Appropriately Handling Missing Values for Statistical Modelling and Prediction - May 22, 2020.
Many statisticians in industry agree that blindly imputing the missing values in your dataset is a dangerous move and should be avoided without first understanding why the data is missing in the first place.
- A Holistic Framework for Managing Data Analytics Projects - May 22, 2020.
Agile project management for Data Science development continues to be an effective framework that enables flexibility and productivity in a field that can experience continuous changes in data and evolving stakeholder expectations. Learn more about the leading approaches for developing Data Science models, and apply them to your next project.
-
Build and deploy your first machine learning web app - May 22, 2020.
A beginner’s guide to train and deploy machine learning pipelines in Python using PyCaret. - Top KDnuggets tweets, May 13-19: Linear algebra and optimization and machine learning: A textbook - May 21, 2020.
Also: Everything you need to become a self-taught #MachineLearning Engineer ; SQL Cheat Sheet (2020) - a useful cheat sheet that documents some of the more commonly used elements of SQL;
- Caserta Announces Pro-Bono Data and Analytics Workshop for Senior Leaders - May 21, 2020.
Caserta is offering a limited number of virtual pro-bono data and analytics workshops conducted by industry leaders Joe Caserta and Doug Laney exclusively for eligible senior leadership. Learn more and sign up now.
- Dimensionality Reduction with Principal Component Analysis (PCA) - May 21, 2020.
This article focuses on design principles of the PCA algorithm for dimensionality reduction and its implementation in Python from scratch.
- Spotting Controversy with NLP - May 21, 2020.
In this article, I’ll introduce you to a hot-topic in financial services and describe how a leading data provider is using data science and NLP to streamline how they find insights in unstructured data.
- Pandas in action! - May 20, 2020.
Pandas is instantly familiar to anyone who’s used spreadsheet software, whether that’s Google Sheets or good old Excel. It’s got columns, it’s got grids, it’s got rows; but pandas is far more powerful. Save 40% with code nlkdpandas40 on this book, and other Manning books and videos.
- Complex logic at breakneck speed: Try Julia for data science - May 20, 2020.
We show a comparative performance benchmarking of Julia with an equivalent Python code to show why Julia is great for data science and machine learning.
-
13 must-read papers from AI experts - May 20, 2020.
What research articles do top AI experts in the field recommend? Find out which ones and why, then be sure to add each to your reading to do list. - Looking Normal(ly Distributed) - May 20, 2020.
This article investigates when some probability distributions look normal "enough" for a statistical test.
- Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language - May 19, 2020.
The new neural network extends BERT to interact with tabular datasets.
- What they do not tell you about machine learning - May 19, 2020.
There's a lot of excitement out there about machine learning jobs. So, it's always good to start off with a healthy dose of reality and proper expectations.
- Sparse Matrix Representation in Python - May 19, 2020.
Leveraging sparse matrix representations for your data when appropriate can spare you memory storage. Have a look at the reasons why, see how to create sparse matrices in Python using Scipy, and compare the memory requirements for standard and sparse representations of the same data.
- Linear algebra and optimization and machine learning: A textbook - May 18, 2020.
This book teaches linear algebra and optimization as the primary topics of interest, and solutions to machine learning problems as applications of these methods. Therefore, the book also provides significant exposure to machine learning.
- Easy Text-to-Speech with Python - May 18, 2020.
Python comes with a lot of handy and easily accessible libraries and we’re going to look at how we can deliver text-to-speech with Python in this article.
- Top Stories, May 11-17: Start Your Machine Learning Career in Quarantine; AI and Machine Learning for Healthcare - May 18, 2020.
Also: Satellite Image Analysis with fast.ai for Disaster Recovery; Machine Learning in Power BI using PyCaret; Deep Learning: The Free eBook; 24 Best (and Free) Books To Understand Machine Learning
- Evidence Counterfactuals for explaining predictive models on Big Data - May 18, 2020.
Big Data generated by people -- such as, social media posts, mobile phone GPS locations, and browsing history -- provide enormous prediction value for AI systems. However, explaining how these models predict with the data remains challenging. This interesting explanation approach considers how a model would behave if it didn't have the original set of data to work with.
-
Automated Machine Learning: The Free eBook - May 18, 2020.
There is a lot to learn about automated machine learning theory and practice. This free eBook can get you started the right way. - Cartoon: The Worst Telemedicine? - May 16, 2020.
New KDnuggets cartoon examines what may be the worst example of telemedicine ...
- 5 Great New Features in Scikit-learn 0.23 - May 15, 2020.
Check out 5 new features of the latest Scikit-learn release, including the ability to visualize estimators in notebooks, improvements to both k-means and gradient boosting, some new linear model implementations, and sample weight support for a pair of existing regressors.
- AI Channels to Follow - May 15, 2020.
AI is certainly playing an important role in our global fight against the novel coronavirus. These YouTube channels are recommended to keep you covered with the latest advancements in the field and how it is impacting our world.
- Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot - May 15, 2020.
The new conversational agent exhibit human-like behavior in conversations about almost any topic.
-
AI and Machine Learning for Healthcare - May 14, 2020.
Traditional business and technology sectors are not the only fields being impacted by AI. Healthcare is a field that is thought to be highly suitable for the applications of AI tools and techniques. - Coding habits for data scientists - May 14, 2020.
While the core machine learning algorithms might only take up a few lines of code, it's the rest of your program that can get messy fast. Learn about some techniques for identifying bad coding habits in ML that add to complexity in code as well as start new habits that can help partition complexity.
- Satellite Image Analysis with fast.ai for Disaster Recovery - May 14, 2020.
We were asked to build ML models using the novel xBD dataset provided by the organizers to estimate damage to infrastructure with the goal of reducing the amount of human labour and time required to plan an appropriate response. This article will focus on the technical aspects of our solution and share our experiences.
- Top KDnuggets tweets, May 06-12: 24 Best (and Free) Books To Understand Machine Learning; We’ve Been Looking At The #Coronavirus Data Wrong - May 13, 2020.
Also: C passes Java and becomes number 1 programming language; This Professor Says We've Been Looking At The #Coronavirus Data Wrong; Some Common #DataScience Stacks
- DeepMind’s Suggestions for Learning #AtHomeWithAI - May 13, 2020.
DeepMind has been sharing resources for learning AI at home on their Twitter account. Check out a few of these suggestions here, and keep your eye on the #AtHomeWithAI hashtag for more.
- Math for Programmers! - May 13, 2020.
Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer.
- I Designed My Own Machine Learning and AI Degree - May 13, 2020.
With so many pioneering online resources for open education, check out this organized collection of courses you can follow to become a well-rounded machine learning and AI engineer.
- Customer Churn Prediction: A Global Performance Study - May 13, 2020.
This article details an automated machine-learned approach to predict customer churn and its results across selected communication service providers around the globe.
- Analytic Professionals – Share your views: Participate in the 2020 Data Science Survey - May 12, 2020.
Analytics & data science professionals: take part now in the Rexer Analytics 2020 Data Science Survey and share your views.
- Machine Learning in Power BI using PyCaret - May 12, 2020.
Check out this step-by-step tutorial for implementing machine learning in Power BI within minutes.
- What You Need to Know About Deep Reinforcement Learning - May 12, 2020.
How does deep learning solve the challenges of scale and complexity in reinforcement learning? Learn how combining these approaches will make more progress toward the notion of Artificial General Intelligence.
- Text Mining in Python: Steps and Examples - May 12, 2020.
The majority of data exists in the textual form which is a highly unstructured format. In order to produce meaningful insights from the text data then we need to follow a method called Text Analysis.
- The Elements of Statistical Learning: The Free eBook - May 11, 2020.
Check out this free ebook covering the elements of statistical learning, appropriately titled "The Elements of Statistical Learning."
-
Start Your Machine Learning Career in Quarantine - May 11, 2020.
While this quarantine can last two months, make the most of it by starting your career in Machine Learning with this 60-day learning plan. - Top Stories, May 4-10: Deep Learning: The Free eBook; Beginners Learning Path for Machine Learning - May 11, 2020.
Also: How use the Coronavirus crisis to kickstart your Data Science career; 5 Concepts You Should Know About Gradient Descent and Cost Function; Five Cool Python Libraries for Data Science; Natural Language Processing Recipes: Best Practices and Examples
- The Architecture Used at LinkedIn to Improve Feature Management in Machine Learning Models - May 11, 2020.
The new typed feature schema streamlined the reusability of features across thousands of machine learning models.
- Top April Stories: Mathematics for Machine Learning: The Free eBook; The Super Duper NLP Repo: 100 Ready-to-Run Colab Notebooks, - May 8, 2020.
Also: Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs; The Super Duper NLP Repo: 100 Ready-to-Run Colab Notebooks; Five Cool Python Libraries for Data Science.
- Will Machine Learning Engineers Exist in 10 Years? - May 8, 2020.
As can be common in many technical fields, the landscape of specialized roles is evolving quickly. With more people learning at least a little machine learning, this could eventually become a common skill set for every software engineer.
- Data Scientists, Corporate Fortune Tellers - May 8, 2020.
I realized that from a corporate perspective, “fortune teller” was not entirely off from the role of a “data scientist”.
- Forecasting Stories 3: Each Time-series Component Sings a Different Song - May 8, 2020.
With time-series decomposition, we were able to infer that the consumers were waiting for the highest sale of the year rather than buying up-front.
- 5 Concepts You Should Know About Gradient Descent and Cost Function - May 7, 2020.
Why is Gradient Descent so important in Machine Learning? Learn more about this iterative optimization algorithm and how it is used to minimize a loss function.
- Chatbots in a Nutshell - May 7, 2020.
Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about chatbots and the ways they are used.
- Hyperparameter Optimization for Machine Learning Models - May 7, 2020.
Check out this comprehensive guide to model optimization techniques.
- Top KDnuggets tweets, Apr 29 – May 5: 24 Best (and Free) Books To Understand Machine Learning - May 6, 2020.
What are Some 'Advanced ' #AI and #MachineLearning Online Courses?; 24 Best (and Free) Books To Understand Machine Learning; Top 5 must-have #DataScience skills for 2020
- Were 21% of New York City residents really infected with the novel coronavirus? - May 6, 2020.
Understanding the types of statistical bias that pop up in popular media and reporting is especially important during this pandemic where the data -- and our global response to the data -- directly impact peoples' lives.
- Explaining “Blackbox” Machine Learning Models: Practical Application of SHAP - May 6, 2020.
Train a "blackbox" GBM model on a real dataset and make it explainable with SHAP.
- Best Coronavirus Projections, Predictions, Dashboards and Data Resources - May 6, 2020.
Check out this curated collection of coronavirus-related projections, dashboards, visualizations, and data that we have encountered on the internet.
- Statistical Thinking for Industrial Problem Solving – a free online statistics course - May 5, 2020.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
-
Beginners Learning Path for Machine Learning - May 5, 2020.
So, you are interested in machine learning? Here is your complete learning path to start your career in the field. - Getting Started with Spectral Clustering - May 5, 2020.
This post will unravel a practical example to illustrate and motivate the intuition behind each step of the spectral clustering algorithm.
-
Top 10 Data Visualization Tools for Every Data Scientist - May 5, 2020.
At present, the data scientist is one of the most sought after professions. That’s one of the main reasons why we decided to cover the latest data visualization tools that every data scientist can use to make their work more effective. - How use the Coronavirus crisis to kickstart your Data Science career - May 4, 2020.
As the global economy dwindles, tech companies are hiring en masse. Now is the time to get yourself noticed as a Data Scientist and try to land your dream job.
- Microsoft Research Unveils Three Efforts to Advance Deep Generative Models - May 4, 2020.
Optimus, FQ-GAN and Prevalent bring new ideas to apply generative models at large scale.
-
Deep Learning: The Free eBook - May 4, 2020.
"Deep Learning" is the quintessential book for understanding deep learning theory, and you can still read it freely online. - Top Stories, Apr 27 – May 3: Five Cool Python Libraries for Data Science; Natural Language Processing Recipes: Best Practices and Examples - May 4, 2020.
Also: Coronavirus COVID-19 Genome Analysis using Biopython; LSTM for time series prediction; A Concise Course in Statistical Inference: The Free eBook; Exploring the Impact of Geographic Information Systems
- Demystifying the AI Infrastructure Stack - May 1, 2020.
AI tools and services are expanding at a rapid clip, and keeping a handle on this evolving ecosystem is crucial for the success of your AI projects. This framework will help you build your technical stack to deploy AI projects faster and at scale.
- Optimize Response Time of your Machine Learning API In Production - May 1, 2020.
This article demonstrates how building a smarter API serving Deep Learning models minimizes the response time.
- Which Face is Real? Applying StyleGAN to Create Fake People - May 1, 2020.
This post explains using a pre-trained GAN to generate human faces, and discusses the most common generative pitfalls associated with doing so.
-
Natural Language Processing Recipes: Best Practices and Examples - May 1, 2020.
Here is an overview of another great natural language processing resource, this time from Microsoft, which demonstrates best practices and implementation guidelines for a variety of tasks and scenarios.