- Managing Your Reusable Python Code as a Data Scientist, by Matthew Mayo - Feb 11, 2022.
Here are a few approaches that I have settled on for managing my own reusable Python code as a data scientist, presented from most to least general code use, and aimed at beginners.
- Ethics, Fairness, and Bias in AI, by Aditya Aggarwal - Jun 30, 2021.
As more AI-enhanced applications seep into our daily lives and expand their reach to larger swaths of populations around the world, we must clearly understand the vulnerabilities trained machine leaning models can exhibit based on the data used during development. Such issues can negatively impact select groups of people, so addressing the ethical decisions made by AI--possibly unknowingly--is important to the long-term fairness and success of this new technology.
- From Scratch: Permutation Feature Importance for ML Interpretability, by Seth Billiau - Jun 30, 2021.
Use permutation feature importance to discover which features in your dataset are useful for prediction — implemented from scratch in Python.
- StreamSets DataOps Platform – Summer ‘21 Public Beta. Sign up today!, by StreamSets - Jun 29, 2021.
Introducing StreamSets DataOps Platform - Summer ‘21 Public Beta! Bringing DataOps to the Cloud for Enterprises.
- Computational Complexity of Deep Learning: Solution Approaches, by Dr. Vijay Srinivas Agneeswaran - Jun 29, 2021.
Why has deep learning been so successful? What is the fundamental reason that deep learning can learn from big data? Why cannot traditional ML learn from the large data sets that are now available for different tasks as efficiently as deep learning can?
- Unleashing the Power of MLOps and DataOps in Data Science, by Yash Mehta - Jun 29, 2021.
Organizations trying to move forward with analytics and data science initiatives -- while floating in an ocean of data -- must enhance their overall approach and culture to embrace a foundation on DataOps and MLOps. Leveraging these operational frameworks are necessary to enable the data to generate real business value.
- 10 Mistakes You Should Avoid as a Data Science Beginner, by Isabelle Flückiger - Jun 29, 2021.
Read this article on how to gain a competitive advantage in the data science job market.
- Top Stories, Jun 21-27: Data Scientists Will be Extinct in 10 Years; Analytics Engineering Everywhere, by KDnuggets - Jun 28, 2021.
Also: Pandas vs SQL: When Data Scientists Should Use Each Tool; How to Land a Data Analytics Job in 6 Months; What will the demand for Data Scientists be in 10 years? Will Data Scientists be extinct?; How to create an interactive 3D chart and share it easily with anyone
- Add A New Dimension To Your Photos Using Python, by Dylan Roy - Jun 28, 2021.
Read this to learn how to breathe new life into your photos with a 3D Ken Burns Effect.
- Data Scientists are from Mars and Software Developers are from Venus, by Anand Rao - Jun 28, 2021.
Within the broad universe of IT in the business world, the approaches for deploying solutions by traditional software engineers and trendy, new data scientists couldn't be more different. However, appreciating these differences are incredibly important because great business value can be gained by integrating both worlds of development into driving more efficiency and effectiveness into an organization.
- How to Train a Joint Entities and Relation Extraction Classifier using BERT Transformer with spaCy 3, by Walid Amamou - Jun 28, 2021.
A step-by-step guide on how to train a relation extraction classifier using Transformer and spaCy3.
- Applied Language Technology: A No-Nonsense Approach, by Matthew Mayo - Jun 25, 2021.
Here is a free entry-level applied natural language processing course that can fit into any beginner's roadmap to understanding NLP. Check it out.
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 2, by Gaurav Menghani - Jun 25, 2021.
As your organization begins to consider building advanced deep learning models with efficiency in mind to improve the power delivered through your solutions, the software and hardware tools required for these implementations are foundational to achieving high-performance.
- How to create an interactive 3D chart and share it easily with anyone, by Olga Chernytska - Jun 25, 2021.
This is a short tutorial on a great Plotly feature.
- Season 1 Of Data Science Perspectives Webcast Released, by Bill Franks - Jun 24, 2021.
Season 1 of Data Science Perspectives is now live and ready for viewing, where I interview many of the executives and professionals I’ve met to enable viewers to learn about how their careers unfolded, what skills they look for when hiring, what trends they think are coming next, and more.
- What will the demand for Data Scientists be in 10 years? Will Data Scientists be extinct?, by Matthew Mayo - Jun 24, 2021.
Participate in the latest KDnuggets survey and share your opinion: what does the next decade have in store for data scientist demand?
- In-Warehouse Machine Learning and the Modern Data Science Stack, by Nick Acosta - Jun 24, 2021.
As your organization matures its data science portfolio and capabilities, establishing a modern data stack is vital to enabling such growth. Here, we overview various in-data warehouse machine learning services, and discuss each of their benefits and requirements.
- 10 Python Code Snippets We Should All Know, by Pralabh Saxena - Jun 24, 2021.
Check out these Python code snippets and start using them to solve everyday problems.
- Workflow Orchestration with Prefect and Coiled, by Coiled.io - Jun 23, 2021.
Coiled helps data scientists use Python for ambitious problems, scaling to the cloud for computing power, ease, and speed—all tuned for the needs of teams and enterprises. In this demo example, see how to spin up a Coiled cluster to execute Prefect jobs during runtime.
- Create and Deploy Dashboards using Voila and Saturn Cloud, by Dhrumil Patel - Jun 23, 2021.
Working with and training large datasets, maintaining them all in one place, and deploying them to production is a challenging job. In this article, we covered what Saturn Cloud is and how it can speed up your end-to-end pipeline, how to create dashboards using Voila and Python and publish them to production in just a few easy steps.
- Data Careers in Demand: Crowd Solutions Architect Explained, by Daria Baidakova - Jun 23, 2021.
How can crowdsourcing support the applications of data teams at an organization? With an ever-increasing demand for more and higher quality data, a new role of the Crowd Solutions Architect (CSA) can leverage the potential of the masses to bring an advantage to a business's capability to deliver effective AI-driven solutions.
- Fine-Tuning Transformer Model for Invoice Recognition, by Walid Amamou - Jun 23, 2021.
The author presents a step-by-step guide from annotation to training.
- The Word “WORD” Has 13 Meanings, by Expert.ai - Jun 22, 2021.
Thoughts around Knowledge Graphs, the semantic nature of language, and the two main types of word ambiguity.
- Amazing Low-Code Machine Learning Capabilities with New Ludwig Update, by Jesus Rodriguez - Jun 22, 2021.
Integration with Ray, MLflow and TabNet are among the top features of this release.
- Analytics Engineering Everywhere, by Jason Ganz - Jun 22, 2021.
Many new roles have appeared in the data world ever since the rise of the Data Scientist took the spotlight several years ago. Now, there is a new core player ready to take center stage, and we may see in five years, nearly every organization will have an Analytics Engineering team.
- What is Segmentation?, by Kevin Gray - Jun 22, 2021.
Segmentation refers to many things, and is one of the most frequently used words in marketing This article looks at segmentation from a somewhat different-than-usual perspective.
- Top Stories, Jun 14-20: Data Scientists Will be Extinct in 10 Years, by KDnuggets - Jun 21, 2021.
Also: Get Interactive Plots Directly With Pandas; How to Generate Automated PDF Documents with Python; Top 10 Data Science Projects for Beginners; Five types of thinking for a high performing data scientist
- Using External Data to Accelerate Business in a Post-Vaccinated World, by Roidna - Jun 21, 2021.
Join this webinar, Jun 24, 2021, to learn how companies are developing insights to better prepare for growth opportunities, improve business performance and mitigate risk in a post-pandemic economy.
- Overview of AutoNLP from Hugging Face with Example Project, by Kevin Vu - Jun 21, 2021.
AutoNLP is a beta project from Hugging Face that builds on the company’s work with its Transformer project. With AutoNLP you can get a working model with just a few simple terminal commands.
- Pandas vs SQL: When Data Scientists Should Use Each Tool, by Matthew Przybyla - Jun 21, 2021.
Exploring data sets and understanding its structure, content, and relationships is a routine and core process for any Data Scientist. Multiple tools exist for performing such analysis, and we take a deep dive into the benefits and different approaches of two important tools, SQL and Pandas.
- How to troubleshoot memory problems in Python, by Freddy Boulton - Jun 21, 2021.
Memory problems are hard to diagnose and fix in Python. This post goes through a step-by-step process for how to pinpoint and fix memory leaks using popular open source python packages.
- Major changes: Where Analytics, Data Science, Machine Learning were applied in 2020/21, by Gregory Piatetsky - Jun 18, 2021.
Our latest poll shows major change in where AI, Data Science, Machine Learning are being applied, with decline in interest in traditional fields like CRM/Consumer Analytics, and growth in applications to Computer Vision, COVID, Agriculture, and Education.
- High Performance Deep Learning, Part 1, by Gaurav Menghani - Jun 18, 2021.
Advancing deep learning techniques continue to demonstrate incredible potential to deliver exciting new AI-enhanced software and systems. But, training the most powerful models is expensive--financially, computationally, and environmentally. Increasing the efficiency of such models will have profound impacts in many ways, so developing future models with this intension in mind will only help to further expand the reach, applicability, and value of what deep learning has to offer.
- Data Science is Not Becoming Extinct in 10 Years, Your Skills Might, by Ahmar Shah, PhD - Jun 18, 2021.
4 reasons why data science is here to stay and what you need to do to ensure that your skillset stays in demand.
- Submit Your Algorithm for a Chance to Win Prizes Totaling $700,000+, by U.S. National Institute of Justice - Jun 17, 2021.
Can your algorithm make fair and accurate #recidivism forecasts? Take part in US National Institute of Justice “Recidivism Forecasting Challenge” with prize money totaling over $700K.
- How to Land a Data Analytics Job in 6 Months, by Natassha Selvaraj - Jun 17, 2021.
Go from zero to hero in under six months. Data science has a very high barrier of entry. It is a very competitive field that everybody from different educational backgrounds are looking to get into. Here is useful advice on how to proceed.
- Data storytelling: brains are built for visuals, but hearts turn on stories, by Hrvoje Smolic - Jun 17, 2021.
Today, we need much more than just numbers about our organization to understand, gain insights, and take relevant actions. While visualizations of the data are important, making an emotional connection with the stories behind the data is key. If you want to sell a story, send a missile to the heart.
- Dashboards for Interpreting & Comparing Machine Learning Models, by Himanshu Sharma - Jun 17, 2021.
This article discusses using Interpret to create dashboards for machine learning models.
- How a Polytechnic Helps You Make the Tech-Business Connection, by Worcester Polytechnic Institute - Jun 16, 2021.
WPI welcomes professionals of all levels to its 100% online MS in Business Analytics — no GRE or GMAT required. Get started here.
- The Best Way to Learn Practical NLP?, by Matthew Mayo - Jun 16, 2021.
Hugging Face has just released a course on using its libraries and ecosystem for practical NLP, and it appears to be very comprehensive. Have a look for yourself.
- An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM), by Chaitanya Krishna Kasaraneni - Jun 16, 2021.
Understanding why your AI-based models make the decisions they do is crucial for deploying practical solutions in the real-world. Here, we review some techniques in the field of Explainable AI (XAI), why explainability is important, example models of explainable AI using LIME and SHAP, and demonstrate how Explainable Boosting Machines (EBMs) can make explainability even easier.
- A Graph-based Text Similarity Method with Named Entity Information in NLP, by Prakhar Mishra - Jun 16, 2021.
In this article, the author summarizes the 2017 paper "A Graph-based Text Similarity Measure That Employs Named Entity Information" as per their understanding. Better understand the concepts by reading along.
- KDnuggets Top Blogs Rewards for May 2021, by Gregory Piatetsky - Jun 15, 2021.
We announce the winners of the first KDnuggets Top Blog Rewards Program.
- The Data Matters: Choosing the right data to analyze can make or break your analysis, by Nomad Data - Jun 15, 2021.
We started Nomad Data to help data scientists and business analysts quickly find the right commercial datasets to match their specific use case. We catalog use cases of data and use machine learning and AI to match analysis goals with datasets.
- 7 Data Security Best Practices for 2021, by Devin Partida - Jun 15, 2021.
Here are seven data security best practices to adopt this year.
- Beginners Guide to Debugging TensorFlow Models, by Ahmad Anis - Jun 15, 2021.
If you are new to working with a deep learning framework, such as TensorFlow, there are a variety of typical errors beginners face when building and training models. Here, we explore and solve some of the most common errors to help you develop a better intuition for debugging in TensorFlow.
- Facebook Launches One of the Toughest Reinforcement Learning Challenges in History, by Jesus Rodriguez - Jun 15, 2021.
The FAIR team just launched the NetHack Challenge as part of the upcoming NeurIPS 2021 competition. The objective is to test new RL ideas using a one of the toughest game environments in the world.
- Top Stories, Jun 7-13: 5 Tasks To Automate With Python; Five types of thinking for a high performing data scientist, by KDnuggets - Jun 14, 2021.
Also: How to Generate Automated PDF Documents with Python; Five types of thinking for a high performing data scientist; How I Doubled My Income with Data Science and Machine Learning; Top 10 Data Science Projects for Beginners
- Data Scientists Will be Extinct in 10 Years, by Mikhail Mew - Jun 14, 2021.
And why it’s not a bad thing.
- Get Interactive Plots Directly With Pandas, by Parul Pandey - Jun 14, 2021.
Telling a story with data is a core function for any Data Scientist, and creating data visualizations that are simultaneously illuminating and appealing can be challenging. This tutorial reviews how to create Plotly and Bokeh plots directly through Pandas plotting syntax, which will help you convert static visualizations into interactive counterparts -- and take your analysis to the next level.
- Building a Knowledge Graph for Job Search Using BERT, by Walid Amamou - Jun 14, 2021.
A guide on how to create knowledge graphs using NER and Relation Extraction.
- Top 10 Data Science Projects for Beginners, by Natassha Selvaraj - Jun 11, 2021.
Check out these projects for ideas to strengthen your skills and build a portfolio that stands out.
- Five types of thinking for a high performing data scientist, by Anand Rao - Jun 11, 2021.
The way you think about a problem and the conceptual process you go through to find a solution may be guided by your personal skills or the type of problem at hand. Many mental models exist representing a variety of thinking patterns -- and as a Data Scientist, appreciating different approaches can help you more effectively model data in the business world and communicate your results to the decision-makers.
- 9 Deadly Sins of Machine Learning Dataset Selection, by Sandeep Uttamchandani - Jun 11, 2021.
Avoid endless pain in model debugging by focusing on datasets upfront.
- Top May Stories: A Guide On How To Become A Data Scientist; Data Scientist, Data Engineer & Other Data Careers, Explained, by Gregory Piatetsky - Jun 10, 2021.
A Guide On How To Become A Data Scientist; Data Scientist, Data Engineer & Other Data Careers, Explained; Vaex: Pandas but 1000x faster; Data Preparation in SQL, with Cheat Sheet
- Numerics V: Integrality – When Being Close Enough is not Always Good Enough, by FICO - Jun 10, 2021.
Wow, already the fifth blog in this series…What is left to tell about numerics? There is another place where a MIP solver can sneak in minor violations that we have not yet discussed: The integrality conditions.
- The Essential Guide to Transformers, the Key to Modern SOTA AI, by Matthew Mayo - Jun 10, 2021.
You likely know Transformers from their recent spate of success stories in natural language processing, computer vision, and other areas of artificial intelligence, but are familiar with all of the X-formers? More importantly, do you know the differences, and why you might use one over another?
- Feature Selection – All You Ever Wanted To Know, by Danny Butvinik - Jun 10, 2021.
Although your data set may contain a lot of information about many different features, selecting only the "best" of these to be considered by a machine learning model can mean the difference between a model that performs well--with better performance, higher accuracy, and more computational efficiency--and one that falls flat. The process of feature selection guides you toward working with only the data that may be the most meaningful, and to accomplish this, a variety of feature selection types, methodologies, and techniques exist for you to explore.
- How to Generate Automated PDF Documents with Python, by Mohammad Khorasani - Jun 10, 2021.
Discover how to leverage automation to create dazzling PDF documents effortlessly.
- How to speed up a Deep Learning Language model by almost 50X at half the cost, by Determined AI - Jun 9, 2021.
In this blog post, we show how to accelerate fine-tuning the ALBERT language model while also reducing costs by using Determined’s built-in support for distributed training with AWS spot instances.
- Data Scientists, You Need to Know How to Code, by Tyler Folkman - Jun 9, 2021.
You need to know how to code — and not just code, but write good code.
- The 7 Best Open Source AI Libraries You May Not Have Heard Of, by Kevin Vu - Jun 9, 2021.
AI researchers today have many exciting options for working with specialized tools. Although starting original projects from scratch is often not necessary, knowing which existing library to leverage remains a challenge. This list of generally unknown yet awesome, open-source libraries offers an interesting collection to consider for state-of-the-art research that spans from automatic machine learning to differentiable quantum circuits.
- How a Single Mistake Wasted 3 Years of My Data Science Journey, by Pranjal Saxena - Jun 9, 2021.
Self-paced courses are just sleeping pills; Industry experts are the right choice.
- SAS® Visual Data Science Decisioning powered by SAS® Viya®: Free Trial, by SAS - Jun 8, 2021.
SAS® Visual Data Science Decisioning provides the ultimate analytics experience. Start your free trial and get access to the latest in data visualization, machine learning, forecasting, model deployment and more.
- This Data Visualization is the First Step for Effective Feature Selection, by Benjamin Obi Tayo - Jun 8, 2021.
Understanding the most important features to use is crucial for developing a model that performs well. Knowing which features to consider requires experimentation, and proper visualization of your data can help clarify your initial selections. The scatter pairplot is a great place to start.
- The only Jupyter Notebooks extension you truly need, by Olga Chernytska - Jun 8, 2021.
Now you don’t need to restart the kernel after editing the code in your custom imports.
- 5 Tips for Picking an Edge AI Platform, by Erik Ottem-Cachengo - Jun 8, 2021.
Edge Analytics isn’t just coding and tools. The different environment outside the datacenter or cloud means a purpose built platform is the best way to deliver consistent results. We discuss 5 different considerations for an edge platform to support your training and deployment.
- 5 Data Science Open-source Projects You Should Consider Contributing to, by Sara Metwalli - Jun 7, 2021.
As you prepare to interview for a position in data science or are looking to jump to the next level, now is the time to enhance your skills and your resume with by working on rea, open-source projects. Here, we suggest a great selection of projects you can contribute to and help build something awesome, so, all you need to do choose one and tackle it head on.
- How to Fine-Tune BERT Transformer with spaCy 3, by Walid Amamou - Jun 7, 2021.
A step-by-step guide on how to create a knowledge graph using NER and Relation Extraction.
- Top Stories, May 31 – Jun 6: A Guide On How To Become A Data Scientist (Step By Step Approach); How I Doubled My Income with Data Science and Machine Learning, by KDnuggets - Jun 7, 2021.
Also: 5 Tasks To Automate With Python; How I Doubled My Income with Data Science and Machine Learning; Will There Be a Shortage of Data Science Jobs in the Next 5 Years?; How to Make Python Code Run Incredibly Fast
- PyCaret 101: An introduction for beginners, by Moez Ali - Jun 7, 2021.
This article is a great overview of how to get started with PyCaret for all your machine learning projects.
- 5 Tasks To Automate With Python, by Dylan Roy - Jun 4, 2021.
Here are 5 tasks you can automate with Python, and how to do it.
- Beyond Brainless AI with a Feature Store, by Jim Dowling - Jun 4, 2021.
AI-powered products that are limited to the data available within its application are like jellyfish: its autonomic system makes it functional, but it lacks a brain. However, you can evolve your models with data enriched "brains" through the help of a feature store.
- 10 Deadly Sins of Machine Learning Model Training, by Sandeep Uttamchandani, Ph.D. - Jun 4, 2021.
These mistakes are easy to overlook but costly to redeem.
- BigQuery vs Snowflake: A Comparison of Data Warehouse Giants, by Anji Velagana - Jun 3, 2021.
In this article we are going to compare the two topmost data warehouses: BigQuery and Snowflake.
- How a Data Scientist Should Communicate with Stakeholders, by Nate Rosidi - Jun 3, 2021.
Effective and collaborative communication with stakeholders is a skill that can help you survive in your role as a Data Scientist at your organization. Learn how to master this interaction, and you will perform your job better, see improved outcomes from your projects, and grow in your capabilities and career.
- Will There Be a Shortage of Data Science Jobs in the Next 5 Years?, by Pranjal Saxena - Jun 3, 2021.
The data science workflow is getting automated day by day.
- Similarity Search: Euclid of Alexandria goes shoe shopping, by Pinecone - Jun 2, 2021.
Many applications can be improved with similarity search. Similarity search can provide more relevant results and therefore improve business outcomes such as conversion rates, engagement rates, detected threats, data quality, and customer satisfaction.
- Machine Learning Model Interpretation, by Himanshu Sharma - Jun 2, 2021.
Read this overview of using Skater to build machine learning visualizations.
- Stop (and Start) Hiring Data Scientists, by Ian Xiao - Jun 2, 2021.
Large companies are losing many data scientists to smaller companies, so what should executives and managers do? These three “stop & start” tactics can improve talent retention, and help define a new way of recruiting and working for the Data Science field.
- How to Make Python Code Run Incredibly Fast, by Pralabh Saxena - Jun 2, 2021.
In this article, I have explained some tips and tricks to optimize and speed up Python code.
- How to Create and Deploy a Simple Sentiment Analysis App via API, by Matthew Mayo - Jun 1, 2021.
In this article we will create a simple sentiment analysis app using the HuggingFace Transformers library, and deploy it using FastAPI.
- How I Doubled My Income with Data Science and Machine Learning, by Terence Shin - Jun 1, 2021.
Many career opportunities exist in the ever-expanding domain of data. Finding your place -- and finding your salary -- is largely up to your dedication, focus, and drive to learn. If you are an aspiring Data Scientist or have already started your professional journey, there are multiple strategies for maximizing your earning potential.
- Overcoming the Simplicity Illusion with Data Migration, by Yancy Blum - Jun 1, 2021.
What’s the key to a smooth data migration experience? It comes down to this primary issue: whether or not you can rapidly determine your dataset composition.