All (81) | News, Education (12) | Opinions (15) | Top Stories, Tweets (8) | Tutorials, Overviews (46)
- Meet whale! The stupidly simple data discovery tool, by Robert Yi - Dec 31, 2020.
Finding data and understanding its meaning represents the traditional "daily grind" of a Data Scientist. With whale, the new lightweight data discovery, documentation, and quality engine for your data warehouse that is under development by Dataframe, your data science team will more efficiently search data and automate its data metrics.
- 15 Free Data Science, Machine Learning & Statistics eBooks for 2021, by Matthew Mayo - Dec 31, 2020.
We present a curated list of 15 free eBooks compiled in a single location to close out the year.
- Data Science as a Product – Why Is It So Hard?, by Tad Slaff - Dec 30, 2020.
Developing machine learning models as products that deliver business value remains a new field with uncharted paths toward success. Applying well-established software development approaches, such as agile, is not straightforward, but may still offer a solid foundation to guide success.
- Generating Beautiful Neural Network Visualizations, by Matthew Mayo - Dec 30, 2020.
If you are looking to easily generate visualizations of neural network architectures, PlotNeuralNet is a project you should check out.
- Key Data Science Algorithms Explained: From k-means to k-medoids clustering, by Arushi Prakash - Dec 29, 2020.
As a core method in the Data Scientist's toolbox, k-means clustering is valuable but can be limited based on the structure of the data. Can expanded methods like PAM (partitioning around medoids), CLARA, and CLARANS provide better solutions, and what is the future of these algorithms?
- Essential Math for Data Science: The Poisson Distribution, by Hadrien Jean - Dec 29, 2020.
The Poisson distribution, named after the French mathematician Denis Simon Poisson, is a discrete distribution function describing the probability that an event will occur a certain number of times in a fixed time (or space) interval.
- 2020: A Year Full of Amazing AI Papers — A Review, by Louis (What's AI) Bouchard - Dec 28, 2020.
So much happened in the world during 2020 that it may have been easy to miss the great progress in the world of AI. To catch you up quickly, check out this curated list of the latest breakthroughs in AI by release date, along with a video explanation, link to an in-depth article, and code.
- Data Catalogs Are Dead; Long Live Data Discovery, by Debashis Saha & Barr Moses - Dec 28, 2020.
Why data catalogs aren’t meeting the needs of the modern data stack, and how a new approach – data discovery – is needed to better facilitate metadata management and data reliability.
- Monte Carlo integration in Python, by Tirthajyoti Sarkar - Dec 24, 2020.
A famous Casino-inspired trick for data science, statistics, and all of science. How to do it in Python?
- Top Stories, Dec 14-20: Crack SQL Interviews; State of Data Science and Machine Learning 2020: 3 Key Findings - Dec 24, 2020.
Also: A Rising Library Beating Pandas in Performance; 20 Core Data Science Concepts for Beginners; How to Create Custom Real-time Plots in Deep Learning; 10 Python Skills They Don’t Teach in Bootcamp
- How to easily check if your Machine Learning model is fair?, by Jakub Wisniewski - Dec 24, 2020.
Machine learning models deployed today -- as will many more in the future -- impact people and society directly. With that power and influence resting in the hands of Data Scientists and machine learning engineers, taking the time to evaluate and understand if model results are fair will become the linchpin for the future success of AI/ML solutions. These are critical considerations, and using a recently developed fairness module in the dalex Python package is a unified and accessible way to ensure your models remain fair.
- Can you trust AutoML?, by Ioannis Tsamardinos - Dec 23, 2020.
Automated Machine Learning, or AutoML, tries hundreds or even thousands of different ML pipelines to deliver models that often beat the experts and win competitions. But, is this the ultimate goal? Can a model developed with this approach be trusted without guarantees of predictive performance? The issue of overfitting must be closely considered because these methods can lead to overestimation -- and the Winner's Curse.
- XGBoost: What it is, and when to use it, by Harish Krishna - Dec 23, 2020.
XGBoost is a tree based ensemble machine learning algorithm which is a scalable machine learning system for tree boosting. Read more for an overview of the parameters that make it work, and when you would use the algorithm.
- The Future of Cloud is Now, by Immuta - Dec 22, 2020.
Our recent survey of over 130 top data engineers, data architects, and executives uncovered details and trends of the current state of data engineering and DataOps.Read our survey report to learn more about these trends as well as our predictions for future obstacles and our recommendations for avoiding them.
- Resampling Imbalanced Data and Its Limits, by Maarit Widmann - Dec 22, 2020.
Can resampling tackle the problem of too few fraudulent transactions in credit card fraud detection?
- Feature Store vs Data Warehouse, by Jim Dowling - Dec 22, 2020.
A feature store is a data warehouse of features for machine learning. Differently from a data warehouse, it is dual-database: one serving features at low latency to online applications and another storing large volumes of features. Learn how Data Scientists leverage this capability in production-deployed models.
- 5 strategies for enterprise machine learning for 2021, by Leah Kolben - Dec 22, 2020.
While it is important for enterprises to continue solving the past challenges in a machine learning pipeline (manage, monitor, track experiments and models) in 2021 enterprises should focus on strategies to achieve scalability, elasticity and operationalization of machine learning.
- Top 9 Data Science Courses to Learn Online, by Simplilearn - Dec 21, 2020.
Learn Data Science from these top courses. Details like cost and course duration are included.
- Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance, by Alejandro Saucedo - Dec 21, 2020.
A practical deep dive on production monitoring architectures for machine learning at scale using real-time metrics, outlier detectors, drift detectors, metrics servers and explainers.
- MLOps Is Changing How Machine Learning Models Are Developed, by Henrik Skogstrom - Dec 21, 2020.
Delivering machine learning solutions is so much more than the model. Three key concepts covering version control, testing, and pipelines are the foundation for machine learning operations (MLOps) that help data science teams ship models quicker and with more confidence.
- Fast and Intuitive Statistical Modeling with Pomegranate, by Tirthajyoti Sarkar - Dec 21, 2020.
Pomegranate is a delicious fruit. It can also be a super useful Python library for statistical analysis. We will show how in this article.
- Optimization Algorithms in Neural Networks, by Nagesh Singh Chauhan - Dec 18, 2020.
This article presents an overview of some of the most used optimizers while training a neural network.
- MLOps – “Why is it required?” and “What it is”?, by Bose & Aggarwal - Dec 18, 2020.
Creating an model that works well is only a small aspect of delivering real machine learning solutions. Learn about the motivation behind MLOps, the framework and its components that will help you get your ML model into production, and its relation to DevOps from the world of traditional software development.
- Navigate the road to Responsible AI, by Ben Lorica - Dec 18, 2020.
Deploying AI ethically and responsibly will involve cross-functional team collaboration, new tools and processes, and proper support from key stakeholders.
- Top 2020 Stories: 24 Best (and Free) Books To Understand Machine Learning; If I had to start learning Data Science again, how would I do it? - Dec 17, 2020.
Also: Know What Employers are Expecting for a Data Scientist Role in 2020; Top Python Libraries for Data Science, Data Visualization & Machine Learning.
- ebook: Fundamentals for Efficient ML Monitoring - Dec 17, 2020.
We've gathered best practices for data science and engineering teams to create an efficient framework to monitor ML models. This ebook provides a framework for anyone who has an interest in building, testing, and implementing a robust monitoring strategy in their organization or elsewhere.
- Undersampling Will Change the Base Rates of Your Model’s Predictions, by Bryan Shalloway - Dec 17, 2020.
In classification problems, the proportion of cases in each class largely determines the base rate of the predictions produced by the model. Therefore if you use sampling techniques that change this proportion, there is a good chance you will want to rescale / calibrate your predictions before using them in the wild.
- Crack SQL Interviews, by Xinran Waibel - Dec 17, 2020.
SQL is an essential programming language for data analysis and processing. So, SQL questions are always part of the interview process for data science-related jobs, including data analysts, data scientists, and data engineers. Become familiar with these common patterns seen in SQL interview questions and follow our tips on how to neatly handle each with SQL queries.
- 8 Places for Data Professionals to Find Datasets, by Devin Partida - Dec 17, 2020.
Here is a curated list of sites and resources invaluable for data professionals to acquire practice datasets.
- Top KDnuggets tweets, Dec 09-15: Main 2020 Developments, Key 2021 Trends in #AI #DataScience #MachineLearning DL Technology from experts - Dec 16, 2020.
Also: Data Science and Machine Learning: The Free eBook; CatBoost vs. Light GBM vs. XGBoost; 10 Python Skills They Don’t Teach in Bootcamp; MIT @techreview read the paper that forced @TimnitGebru out of Google. It presents the history of #NLP and an overview of four main #risks of large language models - here are the details
- How to use Machine Learning for Anomaly Detection and Conditional Monitoring, by Michael Garbade - Dec 16, 2020.
This article explains the goals of anomaly detection and outlines the approaches used to solve specific use cases for anomaly detection and condition monitoring.
- Industry 2021 Predictions for AI, Analytics, Data Science, Machine Learning, by Gregory Piatetsky - Dec 16, 2020.
We bring you industry predictions from 12 innovative companies - what key trends they expect in 2021 in AI, Analytics, Data Science, and Machine Learning?
- How to Clean Text Data at the Command Line, by Ezz El Din Abdullah - Dec 16, 2020.
A basic tutorial about cleaning data using command-line tools: tr, grep, sort, uniq, sort, awk, sed, and csvlook.
- Applications of Data Science and Business Analytics - Dec 15, 2020.
In recent times, a large number of businesses have begun realising the potential of Data Science. Business analytics and data science applications are far and wide. So let us have a look at them in detail.
- Data Science and Machine Learning: The Free eBook, by Matthew Mayo - Dec 15, 2020.
Check out the newest addition to our free eBook collection, Data Science and Machine Learning: Mathematical and Statistical Methods, and start building your statistical learning foundation today.
- Covid or just a Cough? AI for detecting COVID-19 from Cough Sounds, by Ramesh & Teki - Dec 15, 2020.
Increased capabilities in screening and early testing for a disease can significantly support quelling its spread and impact. Recent progress in developing deep learning AI models to classify cough sounds as a prescreening tool for COVID-19 has demonstrated promising early success. Cough-based diagnosis is non-invasive, cost-effective, scalable, and, if approved, could be a potential game-changer in our fight against COVID-19.
- State of Data Science and Machine Learning 2020: 3 Key Findings, by Matthew Mayo - Dec 15, 2020.
Kaggle recently released its State of Data Science and Machine Learning report for 2020, based on compiled results of its annual survey. Read about 3 key findings in the report here.
- Top Stories, Dec 7-13: 20 Core Data Science Concepts for Beginners - Dec 14, 2020.
Also: A Rising Library Beating Pandas in Performance; Main 2020 Developments and Key 2021 Trends in AI, Data Science, Machine Learning Technology; R or Python? Why Not Both?; Artificial Intelligence in Modern Learning System : E-Learning; Essential Math for Data Science: Probability Density and Probability Mass Functions
- How The New World of AI is Driving a New World of Processor Development - Dec 14, 2020.
Blaize’s novel stream processor for Edge AI offers a case study of new opportunities for smaller companies to leverage semiconductor industry resources in pursuit of their goals.
- How to Create Custom Real-time Plots in Deep Learning, by Tirthajyoti Sarkar - Dec 14, 2020.
How to generate real-time visualizations of custom metrics while training a deep learning model using Keras callbacks.
- 6 Things About Data Science that Employers Don’t Want You to Know, by Terence Shin - Dec 14, 2020.
As is the potential for any "trending hot" career, the reality of a position in the field may not be all that you initially expected. Data Science is no exception, and being still a young field, its evolving definition can offer some surprises that you should know about before accepting that dream offer.
- Facebook Open Sources ReBeL, a New Reinforcement Learning Agent, by Jesus Rodriguez - Dec 14, 2020.
The new model tries to recreate the reinforcement learning and search methods used by AlphaZero in imperfect information scenarios.
- Matrix Decomposition Decoded, by Tanveer Sayyed - Dec 11, 2020.
This article covers matrix decomposition, as well as the underlying concepts of eigenvalues (lambdas) and eigenvectors, as well as discusses the purpose behind using matrix and vectors in linear algebra.
- Data Science Volunteering: Ways to Help, by Susan Sivek - Dec 11, 2020.
No matter the field in which you hold some expertise, sharing your skills to benefit the lives of others or supporting non-profit organizations that try to make the world a better place is a noble and time-worthy personal pursuit. Many opportunities exist in data science to contribute to meaningful projects and crucial needs from your local community to a global scale.
- A Rising Library Beating Pandas in Performance, by Ezz El Din Abdullah - Dec 11, 2020.
This article compares the performance of the well-known pandas library with pypolars, a rising DataFrame library written in Rust. See how they compare.
- 10 Python Skills They Don’t Teach in Bootcamp - Dec 11, 2020.
Ascend to new heights in Data Science and Machine Learning with this thrilling list of coding tips.
- Building AI Models for High-Frequency Streaming Data – Part Two - Dec 10, 2020.
Many data scientists have implemented machine or deep learning algorithms on static data or in batch, but what considerations must you make when building models for a streaming environment? In this post, we will discuss these considerations.
- Implementing the AdaBoost Algorithm From Scratch - Dec 10, 2020.
AdaBoost technique follows a decision tree model with a depth equal to one. AdaBoost is nothing but the forest of stumps rather than trees. AdaBoost works by putting more weight on difficult to classify instances and less on those already handled well. AdaBoost algorithm is developed to solve both classification and regression problem. Learn to build the algorithm from scratch here.
- Data Compression via Dimensionality Reduction: 3 Main Methods - Dec 10, 2020.
Lift the curse of dimensionality by mastering the application of three important techniques that will help you reduce the dimensionality of your data, even if it is not linearly separable.
- A Journey from Software to Machine Learning Engineer, by Guillermo Carrasco - Dec 10, 2020.
In this blog post, the author explains his journey from Software Engineer to Machine Learning Engineer. The focus of the blog post is on the areas that the author wished he'd have focused on during his learning journey, and what should you look for outside of books and courses when pursuing your Machine Learning career.
- Top KDnuggets tweets, Dec 2-8: How to do visualization using #Python from scratch - Dec 9, 2020.
K-Means 8x faster, 27x lower error than Scikit-learn's in 25 lines; How to do visualization using #Python from scratch; Why the Future of ETL Is Not ELT, But EL(T); NoSQL for Beginners
- Artificial Intelligence in Modern Learning System : E-Learning - Dec 9, 2020.
There has been a considerable shortage in the supply and demand of AI professionals. If you are looking to learn AI or learn machine learning, you can opt for free online courses offered by Great Learning.
- Main 2020 Developments and Key 2021 Trends in AI, Data Science, Machine Learning Technology, by Gregory Piatetsky - Dec 9, 2020.
Our panel of leading experts reviews 2020 main developments and examines the key trends in AI, Data Science, Machine Learning, and Deep Learning Technology.
- AI registers: finally, a tool to increase transparency in AI/ML - Dec 9, 2020.
Transparency, explainability, and trust are pressing topics in AI/ML today. While much has been written about why they are important and what you need to do, no tools have existed until now.
- Deep Learning Design Patterns! - Dec 9, 2020.
New book, "Deep Learning Design Patterns" presents deep learning models in a unique-but-familiar new way: as extendable design patterns you can easily plug-and-play into your software projects. Use code kdmath50 to save 50% off.
- R or Python? Why Not Both? - Dec 9, 2020.
Do you use both R and Python, either in different projects or in the same? Check out prython, an IDE designed to handle your needs.
- Machine Learning: Cutting Edge Tech with Deep Roots in Other Fields - Dec 8, 2020.
Join INFORMS community of data, analytics, operations research, and statistics professionals and tackle the future together. With nearly 13,000 members around the world, INFORMS is the largest international association for data science professionals.
- Top November Stories: Top Python Libraries for Data Science, Data Visualization & Machine Learning; The Best Data Science Certification You’ve Never Heard Of - Dec 8, 2020.
Also: TabPy: Combining Python and Tableau; How to Acquire the Most Wanted Data Science Skills.
- 20 Core Data Science Concepts for Beginners, by Benjamin Obi Tayo - Dec 8, 2020.
With so much to learn and so many advancements to follow in the field of data science, there are a core set of foundational concepts that remain essential. Twenty of these ideas are highlighted here that are key to review when preparing for a job interview or just to refresh your appreciation of the basics.
- 5 Free Books to Learn Statistics for Data Science - Dec 8, 2020.
Learn all the statistics you need for data science for free.
- Merging Pandas DataFrames in Python - Dec 8, 2020.
A quick how-to guide for merging Pandas DataFrames in Python.
- Top Stories, Nov 30 – Dec 6: Why the Future of ETL Is Not ELT, But EL(T) - Dec 7, 2020.
Also: AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2020 and Key Trends for 2021; Introduction to Data Engineering; Data Science History and Overview; Introduction to Data Engineering; Object-Oriented Programming Explained Simply for Data Scientists
- Dark Data: Why What You Don’t Know Matters - Dec 7, 2020.
In his latest book, a leading statistician Dr. David Hand explores how we can be blind to missing or unseen data and how, in our rush to be a data-driven society, we might be missing things that matter, leading to dangerous decisions that can sometimes have disastrous consequences. Download this free chapter now.
- Essential Math for Data Science: Probability Density and Probability Mass Functions - Dec 7, 2020.
In this article, we’ll cover probability mass and probability density function in this sample. You’ll see how to understand and represent these distribution functions and their link with histograms.
- The Ultimate Guide to Data Engineer Interviews - Dec 7, 2020.
If you are preparing for data engineering interviews, then follow these technical recommendations regarding your resume, programming skills, SQL acumen, and system design problem-solving, as well as the non-technical aspects of your upcoming interview session.
- Change the Background of Any Video with 5 Lines of Code - Dec 7, 2020.
Learn to blur, color, grayscale and create a virtual background for a video with PixelLib.
- Why the Future of ETL Is Not ELT, But EL(T), by John Lafleur - Dec 4, 2020.
The well-established technologies and tools around ETL (Extract, Transform, Load) are undergoing a potential paradigm shift with new approaches to data storage and expanding cloud-based compute. Decoupling the EL from T could reconcile analytics and operational data management use cases, in a new landscape where data warehouses and data lakes are merging.
- Pruning Machine Learning Models in TensorFlow - Dec 4, 2020.
Read this overview to learn how to make your models smaller via pruning.
- Accelerate Your Career in Data Science - Dec 3, 2020.
Fast-track your promotion with a degree in data science. The part-time Master of Science in Analytics allows you to balance your personal and professional life while mastering the cutting-edge technology defining the industry today.
- AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2020 and Key Trends for 2021, by Matthew Mayo - Dec 3, 2020.
2020 is finally coming to a close. While likely not to register as anyone's favorite year, 2020 did have some noteworthy advancements in our field, and 2021 promises some important key trends to look forward to. As has become a year-end tradition, our collection of experts have once again contributed their thoughts. Read on to find out more.
- Introduction to Data Engineering, by Xinran Waibel - Dec 3, 2020.
The Q&A for the most frequently asked questions about Data Engineering: What does a data engineer do? What is a data pipeline? What is a data warehouse? How is a data engineer different from a data scientist? What skills and programming languages do you need to learn to become a data engineer?
- 10 Python Skills for Beginners - Dec 3, 2020.
Python is the fastest growing, most-beloved programming language. Get started with these Data Science tips.
- Top KDnuggets tweets, Nov 25 – Dec 01: 5 Free Books to Learn #Statistics for #DataScience - Dec 2, 2020.
Also: Best #Python IDEs and Code Editors ; Facebook Is Dead (It Just Doesn’t Know It Yet); Enhance your data science game with these portfolio-worthy projects.; The Online Courses You Must Take to be a Better #DataScientist
- Building AI Models for High-Frequency Streaming Data - Dec 2, 2020.
This post is the first in a two-part series on AI for streaming data. Here, we’ll walk through strategies for aligning times and resampling the data.
- Simple & Intuitive Ensemble Learning in R - Dec 2, 2020.
Read about metaEnsembleR, an R package for heterogeneous ensemble meta-learning (classification and regression) that is fully-automated.
- Roadmaps to becoming a Full-Stack AI Developer, Data Scientist, Machine Learning Engineer, and more - Dec 2, 2020.
As the fields related to AI and Data Science expand, they are becoming complex with more options and specializations to consider. If you are beginning your journey toward becoming an expert in Artificial Intelligence, this roadmap will guide you to find your path along what to learn next while steering clear of the latest hype.
- NoSQL for Beginners - Dec 2, 2020.
NoSQL can offer an advantage to those who are entering Data Science and Analytics, as well as having applications with high-performance needs that aren’t met by traditional SQL databases.
- SQream Announces Massive Data Revolution Video Challenge - Dec 1, 2020.
Data professionals are invited to share their massive data challenges from their own unique perspectives. Learn more about the Massive Data Revolution Video Challenge, get a $50 Amazon gift card, and be sure to submit your entry by December 16th.
- Remembering Pluribus: The Techniques that Facebook Used to Master World’s Most Difficult Poker Game - Dec 1, 2020.
Pluribus used incredibly simple AI methods to set new records in six-player no-limit Texas Hold’em poker. How did it do it?
- 14 Data Science projects to improve your skills - Dec 1, 2020.
There's a lot of data out there and so many data science techniques to master or review. Check out these great project ideas from easy to advanced difficulty levels to develop new skills and strengthen your portfolio.
- Object-Oriented Programming Explained Simply for Data Scientists - Dec 1, 2020.
Read this simple but effective guide to start using Classes in Python 3.