Search results for hive
-
Affordable online news archives for academic research
Many researchers need access to multi-year historical repositories of online news articles. We identified three companies that make such access affordable, and spoke with their CEOs.https://www.kdnuggets.com/2018/08/affordable-online-news-archives.html
-
KDnuggets News Archive
Past KDnuggets News issues for 2021 (48 issues), 2020 (48 issues), 2019 (49 issues), 2018 (48 issues), 2017 (48 issues), 2016 (46 issues), 2015 (42 Read more »https://www.kdnuggets.com/news/archive.html
-
How Big Data Is Saving Lives in Real Time: IoV Data Analytics Helps Prevent Accidents
This posts talks about what needs to be taken care of in IoV data analysis, and shows the difference between a near real-time analytic platform and an actual real-time analytic platform with a real-world example.https://www.kdnuggets.com/how-big-data-is-saving-lives-in-real-time-iov-data-analytics-helps-prevent-accidents
-
A Comprehensive List of Resources to Master Large Language Models
Large Language Models (LLMs) have now become an integral part of various applications. This article provides an extensive list of resources for anyone interested to dive into the world of LLMs.https://www.kdnuggets.com/a-comprehensive-list-of-resources-to-master-large-language-models
-
5 Free University Courses on Data Analytics
Thinking about getting into the data analytical world but do not know where to start? Have a look at these 5 FREE university courses on data analytics.https://www.kdnuggets.com/5-free-university-courses-on-data-analytics
-
Best Practices for Building ETLs for ML
This article talks about several best practices for writing ETLs for building training datasets. It delves into several software engineering techniques and patterns applied to ML.https://www.kdnuggets.com/best-practices-for-building-etls-for-ml
-
Customer Segmentation in Python: A Practical Approach
So you want to understand your customer base better? Learn how to leverage RFM analysis and K-Means clustering in Python to perform customer segmentation.https://www.kdnuggets.com/customer-segmentation-in-python-a-practical-approach
-
7 Steps to Mastering Natural Language Processing
Want to learn all about Natural Language Processing (NLP)? Here is a 7 step guide to help you go from the fundamentals of machine learning and Python to Transformers, recent advances in NLP, and beyond.https://www.kdnuggets.com/7-steps-to-mastering-natural-language-processing
-
Getting Started with Google Cloud Platform in 5 Steps
Explore the essentials of Google Cloud Platform for data science and ML, from account setup to model deployment, with hands-on project examples.https://www.kdnuggets.com/5-steps-google-cloud-platform
-
30 Years of Data Science: A Review From a Data Science Practitioner
A review from a data science practitioner.https://www.kdnuggets.com/30-years-of-data-science-a-review-from-a-data-science-practitioner
-
Working with Big Data: Tools and Techniques
Where do you start in a field as vast as big data? Which tools and techniques to use? We explore this and talk about the most common tools in big data.https://www.kdnuggets.com/working-with-big-data-tools-and-techniques
-
Data Cleaning with Pandas
This step-by-step tutorial is for beginners to guide them through the process of data cleaning and preprocessing using the powerful Pandas library.https://www.kdnuggets.com/data-cleaning-with-pandas
-
How to Digest 15 Billion Logs Per Day and Keep Big Queries Within 1 Second
This article describes a large-scale data warehousing use case to provide reference for data engineers who are looking for log analytic solutions. It introduces the log processing architecture and real-case practice in data ingestion, storage, and queries.https://www.kdnuggets.com/how-to-digest-15-billion-logs-per-day-and-keep-big-queries-within-1-second
-
Creating A Simple Docker Data Science Image
This concise primer walks through setting up a Python data science environment using Docker, covering creating a Dockerfile, building an image, running a container, sharing and deploying images, and pushing to Docker Hub.https://www.kdnuggets.com/2023/08/simple-docker-data-science-image.html
-
Harnessing ChatGPT for Automated Data Cleaning and Preprocessing
A guide to using ChatGPT for the tasks of data cleaning and preprocessing on a real-world dataset.https://www.kdnuggets.com/2023/08/harnessing-chatgpt-automated-data-cleaning-preprocessing.html
-
GPT-4 Details Have Been Leaked!
What has OpenAI been keeping in the woodwork about GPT-4?https://www.kdnuggets.com/2023/07/gpt4-details-leaked.html
-
Advanced Feature Selection Techniques for Machine Learning Models
Mastering Feature Selection: An Exploration of Advanced Techniques for Supervised and Unsupervised Machine Learning Models.https://www.kdnuggets.com/2023/06/advanced-feature-selection-techniques-machine-learning-models.html
-
RedPajama Project: An Open-Source Initiative to Democratizing LLMs
Leading project to Empower the Community through Accessible Large Language Models.https://www.kdnuggets.com/2023/06/redpajama-project-opensource-initiative-democratizing-llms.html
-
Top 10 Tools for Detecting ChatGPT, GPT-4, Bard, and Claude
Top free tools for detecting thesis, research papers, assignments, documentation, and blogs generated by AI models.
https://www.kdnuggets.com/2023/05/top-10-tools-detecting-chatgpt-gpt4-bard-llms.html
-
Data Engineering Landscape in the AI-Driven World
Generative AI has just started to capture the imagination of data engineers, so the impact thus far has been just a fraction of what it will be a year or two from now.https://www.kdnuggets.com/2023/05/data-engineering-landscape-aidriven-world.html
-
Clustering with scikit-learn: A Tutorial on Unsupervised Learning
Clustering in machine learning with Python: algorithms, evaluation metrics, real-life applications, and more.https://www.kdnuggets.com/2023/05/clustering-scikitlearn-tutorial-unsupervised-learning.html
-
Fine-Tuning OpenAI Language Models with Noisily Labeled Data
Reduce LLM prediction error by 37% via data-centric AI.https://www.kdnuggets.com/2023/04/finetuning-openai-language-models-noisily-labeled-data.html
-
What Is ChatGPT Doing and Why Does It Work?
In this article, we will explain how ChatGPT works and why it is able to produce coherent and diverse conversations.https://www.kdnuggets.com/2023/04/chatgpt-work.html
-
A Complete Collection of Data Science Free Courses – Part 2
The second part covers the list of Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, Data Engineering, and MLOps.https://www.kdnuggets.com/2023/03/complete-collection-data-science-free-courses-part-2.html
-
How Data Science Can Transform Mobile App Development?
Data science is an intelligent and powerful technology. By knowing how to use data science in mobile app development you can achieve great results.https://www.kdnuggets.com/2023/03/data-science-transform-mobile-app-development.html
-
5 More Command Line Tools for Data Science
Use these tools to Access API, Manipulate CSV files, download datasets, and more from your terminal.https://www.kdnuggets.com/2023/03/5-command-line-tools-data-science.html
-
Top Free Courses on Large Language Models
Interested in learning how ChatGPT and other AI chatbots work under the hood? Look no further. Check out these free courses and resources on large language models from Stanford, Princeton, ETH, and more.https://www.kdnuggets.com/2023/03/top-free-courses-large-language-models.html
-
Making Intelligent Document Processing Smarter: Part 1
This article attempts to measure the effect of various noises present in scanned documents on the performance of various APIs in the OCR segment.https://www.kdnuggets.com/2023/02/making-intelligent-document-processing-smarter-part-1.html
-
The Complete Data Science Study Roadmap
This article will map out the things you need to do to become a data scientist.https://www.kdnuggets.com/2022/08/complete-data-science-study-roadmap.html
-
AI is Not Here to Replace Us
Is the fear of AI replacing humans justified? Here we have a look at what AI is good for and what it isn’t.https://www.kdnuggets.com/2023/02/ai-replace-us.html
-
10 Free Machine Learning Courses from Top Universities
Learn the basics of machine learning, including classification, SVM, decision tree learning, neural networks, convolutional, neural networks, boosting, and K nearest neighbors.https://www.kdnuggets.com/2023/02/10-free-machine-learning-courses-top-universities.html
-
From Data Collection to Model Deployment: 6 Stages of a Data Science Project
Here are 6 stages of a novel Data Science Project; From Data Collection to Model in Production, backed by research and examples.https://www.kdnuggets.com/2023/01/data-collection-model-deployment-6-stages-data-science-project.html
-
5 Tasks To Automate With Python
Here are 5 tasks you can automate with Python, and how to do it.https://www.kdnuggets.com/2021/06/5-tasks-automate-python.html
-
The Complete Machine Learning Study Roadmap
Find out where you need to be to start your Machine Learning journey and what you need to do to succeed in the field.
https://www.kdnuggets.com/2022/12/complete-machine-learning-study-roadmap.html
-
10 Amazing Machine Learning Visualizations You Should Know in 2023
Yellowbrick for creating machine learning plots with less code.https://www.kdnuggets.com/2022/11/10-amazing-machine-learning-visualizations-know-2023.html
-
How LinkedIn Uses Machine Learning To Rank Your Feed
In this post, you will learn to clarify business problems & constraints, understand problem statements, select evaluation metrics, overcome technical challenges, and design high-level systems.https://www.kdnuggets.com/2022/11/linkedin-uses-machine-learning-rank-feed.html
-
Getting Started with PyCaret
An open-source low-code machine learning library for training and deploying the models in production.https://www.kdnuggets.com/2022/11/getting-started-pycaret.html
-
Explaining Explainable AI for Conversations
Something is missing in artificial intelligence – trust.https://www.kdnuggets.com/2022/10/explaining-explainable-ai-conversations.html
-
Data-centric AI and Tabular Data
DALL-E, LaMDA, and GPT-3 all had celebrity moments recently. So, where’s the glamorous, high-performance model that’s mastered tabular data?https://www.kdnuggets.com/2022/09/datacentric-ai-tabular-data.html
-
Everything You Need to Know About Data Lakehouses
Learn everything you need to know about data lakehouses.https://www.kdnuggets.com/2022/09/everything-need-know-data-lakehouses.html
-
How to Select Rows and Columns in Pandas Using [ ], .loc, iloc, .at and .iat
Subset selection is one of the most frequently performed tasks while manipulating data. Pandas provides different ways to efficiently select subsets of data from your DataFrame.
https://www.kdnuggets.com/2019/06/select-rows-columns-pandas.html
-
How to Package and Distribute Machine Learning Models with MLFlow
MLFlow is a tool to manage the end-to-end lifecycle of a Machine Learning model. Likewise, the installation and configuration of an MLFlow service is addressed and examples are added on how to generate and share projects with MLFlow.https://www.kdnuggets.com/2022/08/package-distribute-machine-learning-models-mlflow.html
-
How to land an ML job: Advice from engineers at Meta, Google Brain, and SAP
Check out this video, summary and transcript of a discussion between co:rise co-founder Jake Samuelson and three outstanding ML engineers — Kaushik Rangadurai, Shalvi Mahajan, and Frank Chen — to hear their advice on landing a job in machine learning.https://www.kdnuggets.com/2022/08/corise-land-ml-job-advice-engineers-meta-google-brain-sap.html
-
What is Text Classification?
We will define text classification, how it works, some of its most known algorithms, and provide data sets that might help start your text classification journey.https://www.kdnuggets.com/2022/07/text-classification.html
-
K-nearest Neighbors in Scikit-learn
Learn about the k-nearest neighbours algorithm, one of the most prominent workhorse machine learning algorithms there is, and how to implement it using Scikit-learn in Python.https://www.kdnuggets.com/2022/07/knearest-neighbors-scikitlearn.html
-
3 things you didn’t know about the SAS Academy for Data Science
The SAS Academy for Data Science is one of many paths to becoming a data scientist. It is designed for those who have a background in programming and mathematics, who want to upskill as part of a career change or those who want to gain the hands-on practical skills that can advance your professional growth and experience with SAS and data science.https://www.kdnuggets.com/2022/07/sas-3-things-didnt-know-sas-academy-data-science.html
-
Boosting Machine Learning Algorithms: An Overview
The combination of several machine learning algorithms is referred to as ensemble learning. There are several ensemble learning techniques. In this article, we will focus on boosting.https://www.kdnuggets.com/2022/07/boosting-machine-learning-algorithms-overview.html
-
Top 15 Books to Master Data Strategy
In this article, we outline 15 books on topics ranging from the technical to the non-technical, to help you improve your understanding of end-to-end best practices related to data.https://www.kdnuggets.com/2022/06/top-15-books-master-data-strategy.html
-
A Structured Approach To Building a Machine Learning Model
This article gives you a glimpse of how to approach a machine learning project with a clear outline of an easy-to-implement 5-step process.https://www.kdnuggets.com/2022/06/structured-approach-building-machine-learning-model.html
-
Free Data Engineering Courses
Get into the highly in-demand world of data engineering for free and earn 6 figures salary.https://www.kdnuggets.com/2022/05/free-data-engineering-courses.html
-
Should The Data Warehouse Be Immutable?
Is the data warehouse broken? Is the "immutable data warehouse" the right path for your data team? Learn more here.https://www.kdnuggets.com/2022/05/data-warehouse-immutable.html
-
Machine Learning’s Sweet Spot: Pure Approaches in NLP and Document Analysis
While it is true that Machine Learning today isn’t ready for prime time in many business cases that revolve around Document Analysis, there are indeed scenarios where a pure ML approach can be considered.https://www.kdnuggets.com/2022/05/machine-learning-sweet-spot-pure-approaches-nlp-document-analysis.html
-
Free University Data Science Resources
This is a list of FREE data science resources and notes that are available online, some of which are provided by universities.
https://www.kdnuggets.com/2022/05/free-university-data-science-resources.html
-
SQL Notes for Professionals: The Free eBook Review
The free book is a combination of SQL cheat sheets and practical database examples. It provided bite-size information about every SQL function and attribute with coding samples.https://www.kdnuggets.com/2022/05/sql-notes-professionals-free-ebook-review.html
-
Data Scientist, Data Engineer & Other Data Careers, Explained
In this article, we will have a look at five distinct data careers, and hopefully provide some advice on how to get one's feet wet in this convoluted field.https://www.kdnuggets.com/2021/05/data-scientist-data-engineer-data-careers-explained.html
-
Top Data Science Projects to Build Your Skills
Check out this list of data science project ideas that you can use to boost your skills, organized by level of expertise.https://www.kdnuggets.com/2022/04/top-data-science-projects-build-skills.html
-
Building a Scalable ETL with SQL + Python
This post will look at building a modular ETL pipeline that transforms data with SQL and visualizes it with Python and R.https://www.kdnuggets.com/2022/04/building-scalable-etl-sql-python.html
-
Nearest Neighbors for Classification
Learn about the K-Nearest Neighbors machine learning algorithm for classification.https://www.kdnuggets.com/2022/04/nearest-neighbors-classification.html
-
The Complete Collection Of Data Repositories – Part 2
Check out the collection of the best data repositories on healthcare, natural language, neuroscience, physics, social network, sports, time series, transportation, miscellaneous, and super data repositories.https://www.kdnuggets.com/2022/04/complete-collection-data-repositories-part-2.html
-
The Range of NLP Applications in the Real World: A Different Solution To Each Problem
Most companies look at it like it’s one big technology, and assume the vendors’ offerings might differ in product quality and price but ultimately be largely the same. Truth is, NLP is not one thing; it’s not one tool, but rather a toolbox.https://www.kdnuggets.com/2022/03/different-solution-problem-range-nlp-applications-real-world.html
-
Using Data Science to Make Clean Energy More Equitable
Here are some lessons inspired by a recent panel the author moderated about how data scientists can help put equity into practice.https://www.kdnuggets.com/2022/03/data-science-make-clean-energy-equitable.html
-
Cloud Storage Adoption is the Need of the Hour for Business
The rush towards cloud storage means that the cloud has to offer a valuable proposition to businesses. Let’s explore why businesses regardless of their size should consider moving to the cloud.https://www.kdnuggets.com/2022/02/cloud-storage-adoption-need-hour-business.html
-
The Complete Collection of Data Science Cheat Sheets – Part 1
A collection of cheat sheets that will help you prepare for a technical interview, assessment tests, class presentation, and help you revise core data science concepts.
https://www.kdnuggets.com/2022/02/complete-collection-data-science-cheat-sheets-part-1.html
-
19 Data Science Project Ideas for Beginners
This article features 19 data science projects for beginners, categorized into 7 full project tutorials, 5 places to come up with your own data science projects using data, and 7 skills-based data science projects.https://www.kdnuggets.com/2021/11/19-data-science-project-ideas-beginners.html
-
Data Science Programming Languages and When To Use Them
Read this guide through the most common data science programming languages and when to use them in data science.
https://www.kdnuggets.com/2022/02/data-science-programming-languages.html
-
Effective Testing for Machine Learning
Given how uncertain ML projects are, this is an incremental strategy that you can adopt as your project matures; it includes test examples to provide a clear idea of how these tests look in practice, and a complete project implementation is available on GitHub. By the end of the post, you’ll be able to develop more robust ML pipelines.https://www.kdnuggets.com/2022/01/effective-testing-machine-learning.html
-
What to Expect From Your Career Path as a Data Scientist
Learn about the roles between you and the Director of Data Science.https://www.kdnuggets.com/2022/01/expect-career-path-data-scientist.html
-
A (Much) Better Approach to Evaluate Your Machine Learning Model
Using one or two performance metrics seems sufficient to claim that your ML model is good — chances are that it’s not.https://www.kdnuggets.com/2022/01/much-better-approach-evaluate-machine-learning-model.html
-
The Story of the Women in Data Science (WiDS) Datathon
The author shares their experience of almost winning the competition and the things they have learned from the failures. Learn more about the WiDS Datathon and tips on winning the next challenge.https://www.kdnuggets.com/2022/01/story-women-data-science-wids-datathon.html
-
Federated Learning: Collaborative Machine Learning with a Tutorial on How to Get Started
Read on to learn more about the intricacies of federated learning and what it can do for machine learning on sensitive data.https://www.kdnuggets.com/2021/12/federated-learning-collaborative-machine-learning-tutorial-get-started.html
-
Data Labeling for Machine Learning: Market Overview, Approaches, and Tools
So much of data science and machine learning is founded on having clean and well-understood data sources that it is unsurprising that the data labeling market is growing faster than ever. Here, we highlight many of the top players in this industry and the techniques they use to help you consider which might make a good partner for your needs.https://www.kdnuggets.com/2021/12/data-labeling-ml-overview-and-tools.html
-
Introduction to Clustering in Python with PyCaret
A step-by-step, beginner-friendly tutorial for unsupervised clustering tasks in Python using PyCaret.https://www.kdnuggets.com/2021/12/introduction-clustering-python-pycaret.html
-
Introduction to Binary Classification with PyCaret
PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use it for binary classification.https://www.kdnuggets.com/2021/12/introduction-binary-classification-pycaret.html
-
Meta-Learning for Keyphrase Extraction
This article explores Meta-Learning for Key phrase Extraction, which delves into the how and why of KeyPhrase Extraction (KPE) - extracting phrases/groups of words from a document to best capture and represent its content. The article outline what needs to be done to build a keyphrase extractor that performs well not only on in-domain data, but also in a zero-shot scenario where keyphrases need to be extracted from data that have a different distribution (either a different domain or a different type of documents).https://www.kdnuggets.com/2021/12/metalearning-keyphrase-extraction.html
-
Movie Recommendations with Spark Collaborative Filtering
Not sure what movie to watch? Ask your recommender system.https://www.kdnuggets.com/2021/12/movie-recommendations-spark-collaborative-filtering.html
-
KDnuggets: Personal History and Nuggets of Experience
After 28+ years of publishing and editing KDnuggets, I am retiring and transitioning KDnuggets to Matthew Mayo, who will become the new editor-in-chief. I want to share with you my story of KDnuggets and highlight some of the useful nuggets of experience I learned along this amazing journey.https://www.kdnuggets.com/2021/11/kdnuggets-history.html
-
Build a Serverless News Data Pipeline using ML on AWS Cloud
This is the guide on how to build a serverless data pipeline on AWS with a Machine Learning model deployed as a Sagemaker endpoint.https://www.kdnuggets.com/2021/11/build-serverless-news-data-pipeline-ml-aws-cloud.html
-
Book Metadata and Cover Retrieval Using OCR and Google Books API
With KNIME extracting critical pieces of information from images becomes as easy as ABC.https://www.kdnuggets.com/2021/11/book-metadata-cover-retrieval-ocr-google-books-api.html
-
ORDAINED: The Python Project Template">
Recently I decided to take the time to better understand the Python packaging ecosystem and create a project boilerplate template as an improvement over copying a directory tree and doing find and replace.ORDAINED: The Python Project Template
https://www.kdnuggets.com/2021/11/ordained-python-project-template.html
-
A Guide to 14 Different Data Science Jobs">
The field of data science is growing into one that features a variety of job titles This guide reviews different positions available for you to consider if you have a data science background.A Guide to 14 Different Data Science Jobs
https://www.kdnuggets.com/2021/10/guide-14-different-data-science-jobs.html
-
Machine Learning Model Development and Model Operations: Principles and Practices">
The ML model management and the delivery of highly performing model is as important as the initial build of the model by choosing right dataset. The concepts around model retraining, model versioning, model deployment and model monitoring are the basis for machine learning operations (MLOps) that helps the data science teams deliver highly performing models.Machine Learning Model Development and Model Operations: Principles and Practices
https://www.kdnuggets.com/2021/10/machine-learning-model-development-operations-principles-practice.html
-
Step-by-step instructions on how to understand Deep Learning papers and implement the described approaches.Learn To Reproduce Papers: Beginner’s Guide">
Learn To Reproduce Papers: Beginner’s Guide
https://www.kdnuggets.com/2021/10/learn-reproduce-papers-beginners-guide.html
-
Data science SQL interview questions from top tech firms">
As a data scientist, there is one thing you really need to understand and know how to handle: data. With SQL being a foundational technical approach for working with data, it should not be surprising that the top tech companies will ask about your SQL skills during an interview. Here, we cover the key concepts tested so you can best prepare for your next data science interview.Data science SQL interview questions from top tech firms
https://www.kdnuggets.com/2021/10/data-science-sql-interview-questions.html
-
Scale and Govern AI Initiatives with ModelOps
AI/ML model life cycle automation and orchestration ensures reliable model operations and governance at scale. The path to production for each enterprise model can vary, along with different monitoring, continuous improvement, retirement needs. Organizations must now consider ModelOps as a fundamental capability towards operational excellence and immediate ROIs.https://www.kdnuggets.com/2021/09/scale-govern-ai-modelops.html
-
Computer Vision in Agriculture
Deep learning isn’t just for placing ads or identifying cats anymore. Instead, a slew of young startups have started to incorporate the advances in computer vision made possible through larger and larger neural networks to real working robots in the fields.https://www.kdnuggets.com/2021/09/computer-vision-agriculture.html
-
Data Analysis Using Scala
It is very important to choose the right tool for data analysis. On the Kaggle forums, where international Data Science competitions are held, people often ask which tool is better. R and Python are at the top of the list. In this article we will tell you about an alternative stack of data analysis technologies, based on Scala.https://www.kdnuggets.com/2021/09/data-analysis-scala.html
-
Real-Time Histogram Plots on Unbounded Data
Using histograms on real-time data is not possible in most of the popular data science libraries. In this article you will learn how dynamically compute and display a histogram within a Python notebook.https://www.kdnuggets.com/2021/09/real-time-histogram-plots-unbounded-data.html
-
GitHub Copilot and the Rise of AI Language Models in Programming Automation
Read on to learn more about what makes Copilot different from previous autocomplete tools (including TabNine), and why this particular tool has been generating so much controversy.https://www.kdnuggets.com/2021/09/github-copilot-rise-ai-language-models-programming-automation.html
-
Data Engineering Technologies 2021
Emerging technologies supporting the field of data engineering are growing at a rapid clip. This curated list includes the most important offerings available in 2021.https://www.kdnuggets.com/2021/09/data-engineering-technologies-2021.html
-
Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV
This article documents the authors' experience building their custom MLOps approach.https://www.kdnuggets.com/2021/09/adventures-mlops-github-actions-iterative-ai-label-studio-and-nbdev.html
-
6 Cool Python Libraries That I Came Across Recently
Check out these awesome Python libraries for Machine Learning.https://www.kdnuggets.com/2021/09/6-cool-python-libraries-recently.html
-
5 Things That Make My Job as a Data Scientist Easier
After working as a Data Scientist for a year, I am here to share some things I learnt along the way that I feel are helpful and have increased my efficiency. Hopefully some of these tips can help you in your journey :)https://www.kdnuggets.com/2021/08/5-things-job-data-scientist-easier.html
-
Demystifying AI: The prejudices of Artificial Intelligence (and human beings)
AI models are necessarily trained on historical data from the real-world--data that is generated from the daily goings on of society. If social-based biases are inherent in the training data, then will the AI predictions highlight these same biases? If so, what should we do (or not do) about making AI fair?https://www.kdnuggets.com/2021/08/demystifying-ai-prejudices.html
-
Writing Your First Distributed Python Application with Ray
Using Ray, you can take Python code that runs sequentially and transform it into a distributed application with minimal code changes. Read on to find out why you should use Ray, and how to get started.https://www.kdnuggets.com/2021/08/distributed-python-application-ray.html
-
AI in Real Life
What do you need to get started on your AI journey? Putting together a combination of the right project, people and infrastructure is no easy task. SAS and MIT SMR have collaborated to provide a comprehensive set of resources to guide you from conception to implementation. Learn from experts that successfully launched AI projects.https://www.kdnuggets.com/2021/08/sas-ai-real-life.html
-
Mastering Clustering with a Segmentation Problem
The one stop shop for implementing the most widely used models in Python for unsupervised clustering.https://www.kdnuggets.com/2021/08/mastering-clustering-segmentation-problem.html
-
Development & Testing of ETL Pipelines for AWS Locally
Typically, development and testing ETL pipelines is done on real environment/clusters which is time consuming to setup & requires maintenance. This article focuses on the development and testing of ETL pipelines locally with the help of Docker & LocalStack. The solution gives flexibility to test in a local environment without setting up any services on the cloud.https://www.kdnuggets.com/2021/08/development-testing-etl-pipelines-aws-locally.html
-
WHT: A Simpler Version of the fast Fourier Transform (FFT) you should know
The fast Walsh Hadamard transform is a simple and useful algorithm for machine learning that was popular in the 1960s and early 1970s. This useful approach should be more widely appreciated and applied for its efficiency.https://www.kdnuggets.com/2021/07/wht-simpler-fast-fourier-transform-fft.html
-
Computational Complexity of Deep Learning: Solution Approaches
Why has deep learning been so successful? What is the fundamental reason that deep learning can learn from big data? Why cannot traditional ML learn from the large data sets that are now available for different tasks as efficiently as deep learning can?https://www.kdnuggets.com/2021/06/computational-complexity-deep-learning-solution-approaches.html
-
Overcoming the Simplicity Illusion with Data Migration
What’s the key to a smooth data migration experience? It comes down to this primary issue: whether or not you can rapidly determine your dataset composition.https://www.kdnuggets.com/2021/06/overcoming-simplicity-illusion-data-migration.html
-
Supercharge Your Machine Learning Experiments with PyCaret and Gradio
A step-by-step tutorial to develop and interact with machine learning pipelines rapidly.https://www.kdnuggets.com/2021/05/supercharge-machine-learning-experiments-pycaret-gradio.html
-
Budgeting For Your AI Training Data: Consider These 3 Factors
Before you even plan to procure the data, one of the most important considerations in determining how much you should spend on your AI training data. In this article, we will give you insights to develop an effective budget for AI training data.https://www.kdnuggets.com/2021/05/shaip-budgeting-ai-training-data.html
-
Topic Modeling with Streamlit
What does it take to create and deploy a topic modeling web application quickly? Read this post to see how the author uses Python NLP packages for topic modeling, Streamlit for the web application framework, and Streamlit Sharing for deployment.https://www.kdnuggets.com/2021/05/topic-modeling-streamlit.html
-
Awesome list of datasets in 100+ categories
With an estimated 44 zettabytes of data in existence in our digital world today and approximately 2.5 quintillion bytes of new data generated daily, there is a lot of data out there you could tap into for your data science projects. It's pretty hard to curate through such a massive universe of data, but this collection is a great start. Here, you can find data from cancer genomes to UFO reports, as well as years of air quality data to 200,000 jokes. Dive into this ocean of data to explore as you learn how to apply data science techniques or leverage your expertise to discover something new.https://www.kdnuggets.com/2021/05/awesome-list-datasets.html
-
The Most In Demand Skills for Data Engineers in 2021
If you are preparing to make a career in data or are looking for opportunities to skill-up in your current data-centric role, then this analysis of in-demand skills for 2021, based on over 17,000 Data Engineer job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.https://www.kdnuggets.com/2021/05/most-demand-skills-data-engineers-2021.html
-
The Most In-Demand Skills for Data Scientists in 2021">
If you are preparing to make a career as a Data Scientist or are looking for opportunities to skill-up in your current role, this analysis of in-demand skills for 2021, based on over 15,000 Data Scientist job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.The Most In-Demand Skills for Data Scientists in 2021
https://www.kdnuggets.com/2021/04/most-demand-skills-data-scientists.html
-
How to deploy Machine Learning/Deep Learning models to the web">
The full value of your deep learning models comes from enabling others to use them. Learn how to deploy your model to the web and access it as a REST API, and begin to share the power of your machine learning development with the world.How to deploy Machine Learning/Deep Learning models to the web
https://www.kdnuggets.com/2021/04/deploy-machine-learning-models-to-web.html
-
DeepMind’s AlphaFold & the Protein Folding Problem
Recently, DeepMind's AlphaFold made impressive headway in the protein structure prediction problem. Read this for an overview and explanation.https://www.kdnuggets.com/2021/03/deepmind-alphafold-protein-folding-problem.html
-
6 Web Scraping Tools That Make Collecting Data A Breeze
The first step of any data science project is data collection. While it can be the most tedious and time-consuming step during your workflow, there will be no project without that data. If you are scraping information from the web, then several great tools exist that can save you a lot of time, money, and effort.https://www.kdnuggets.com/2021/02/6-web-scraping-tools.html
-
Feature Store as a Foundation for Machine Learning
With so many organizations now taking the leap into building production-level machine learning models, many lessons learned are coming to light about the supporting infrastructure. For a variety of important types of use cases, maintaining a centralized feature store is essential for higher ROI and faster delivery to market. In this review, the current feature store landscape is described, and you can learn how to architect one into your MLOps pipeline.https://www.kdnuggets.com/2021/02/feature-store-foundation-machine-learning.html
-
Machine learning adversarial attacks are a ticking time bomb
Software developers and cyber security experts have long fought the good fight against vulnerabilities in code to defend against hackers. A new, subtle approach to maliciously targeting machine learning models has been a recent hot topic in research, but its statistical nature makes it difficult to find and patch these so-called adversarial attacks. Such threats in the real-world are becoming imminent as the adoption of machine learning spreads, and a systematic defense must be implemented.https://www.kdnuggets.com/2021/01/machine-learning-adversarial-attacks.html
-
Data Engineering — the Cousin of Data Science, is Troublesome">
A Data Scientist must be a jack of many, many trades. Especially when working in broader teams, understanding the roles of others, such as data engineering, can help you validate progress and be aware of potential pitfalls. So, how can you convince your analysts to realize the importance of expanding their toolkit? Examples from real life often provide great insight.Data Engineering — the Cousin of Data Science, is Troublesome
https://www.kdnuggets.com/2021/01/data-engineering-troublesome.html
-
The Best Tool for Data Blending is KNIME
These are the lessons and best practices I learned in many years of experience in data blending, and the software that became my most important tool in my day-to-day work.https://www.kdnuggets.com/2021/01/best-tool-data-blending-knime.html
-
Model Experiments, Tracking and Registration using MLflow on Databricks
This post covers how StreamSets can help expedite operations at some of the most crucial stages of Machine Learning Lifecycle and MLOps, and demonstrates integration with Databricks and MLflow.https://www.kdnuggets.com/2021/01/model-experiments-tracking-registration-mlflow-databricks.html
-
How to easily check if your Machine Learning model is fair?
Machine learning models deployed today -- as will many more in the future -- impact people and society directly. With that power and influence resting in the hands of Data Scientists and machine learning engineers, taking the time to evaluate and understand if model results are fair will become the linchpin for the future success of AI/ML solutions. These are critical considerations, and using a recently developed fairness module in the dalex Python package is a unified and accessible way to ensure your models remain fair.https://www.kdnuggets.com/2020/12/machine-learning-model-fair.html
-
Resampling Imbalanced Data and Its Limits
Can resampling tackle the problem of too few fraudulent transactions in credit card fraud detection?https://www.kdnuggets.com/2020/12/resampling-imbalanced-data-limits.html
-
Feature Store vs Data Warehouse
A feature store is a data warehouse of features for machine learning. Differently from a data warehouse, it is dual-database: one serving features at low latency to online applications and another storing large volumes of features. Learn how Data Scientists leverage this capability in production-deployed models.https://www.kdnuggets.com/2020/12/feature-store-vs-data-warehouse.html
-
8 Places for Data Professionals to Find Datasets
Here is a curated list of sites and resources invaluable for data professionals to acquire practice datasets.https://www.kdnuggets.com/2020/12/8-places-data-professionals-find-datasets.html
-
Data Compression via Dimensionality Reduction: 3 Main Methods
Lift the curse of dimensionality by mastering the application of three important techniques that will help you reduce the dimensionality of your data, even if it is not linearly separable.https://www.kdnuggets.com/2020/12/data-compression-dimensionality-reduction.html
-
Introduction to Data Engineering">
The Q&A for the most frequently asked questions about Data Engineering: What does a data engineer do? What is a data pipeline? What is a data warehouse? How is a data engineer different from a data scientist? What skills and programming languages do you need to learn to become a data engineer?Introduction to Data Engineering
https://www.kdnuggets.com/2020/12/introduction-data-engineering.html
-
Essential Math for Data Science: Integrals And Area Under The Curve
In this article, you’ll learn about integrals and the area under the curve using the practical data science example of the area under the ROC curve used to compare the performances of two machine learning models.https://www.kdnuggets.com/2020/11/essential-math-data-science-integrals-area-under-curve.html
-
Compute Goes Brrr: Revisiting Sutton’s Bitter Lesson for AI
"It's just about having more compute." Wait, is that really all there is to AI? As Richard Sutton's 'bitter lesson' sinks in for more AI researchers, a debate has stirred that considers a potentially more subtle relationship between advancements in AI based on ever-more-clever algorithms and massively scaled computational power.https://www.kdnuggets.com/2020/11/revisiting-sutton-bitter-lesson-ai.html
-
My Data Science Online Learning Journey on Coursera
Check out the author's informative list of courses and specializations on Coursera taken to get started on their data science and machine learning journey.https://www.kdnuggets.com/2020/11/data-science-online-learning-journey-coursera.html
-
The CDMP is the best data strategy certification you’ve never heard of. (And honestly, when you consider the fact that you’re probably working a job that didn’t exist ten years ago, it’s not surprising that this certification isn’t widespread just yet.)The Best Data Science Certification You’ve Never Heard Of">
The Best Data Science Certification You’ve Never Heard Of
https://www.kdnuggets.com/2020/11/best-data-science-certification-never-heard.html
-
Text Mining with R: The Free eBook">
This freely-available book will show you how to perform text analytics in R, using packages from the tidyverse.Text Mining with R: The Free eBook
https://www.kdnuggets.com/2020/10/text-mining-r-free-ebook.html
-
Missing Value Imputation – A Review
Detecting and handling missing values in the correct way is important, as they can impact the results of the analysis, and there are algorithms that can’t handle them. So what is the correct way?https://www.kdnuggets.com/2020/09/missing-value-imputation-review.html
-
Performance Testing on Big Data Applications
You can use performance testing in any application you’re working on but it’s especially useful for big data applications. Let’s see why.https://www.kdnuggets.com/2020/08/performance-testing-big-data-applications.html
-
Containerization of PySpark Using Kubernetes
This article demonstrates the approach of how to use Spark on Kubernetes. It also includes a brief comparison between various cluster managers available for Spark.https://www.kdnuggets.com/2020/08/containerization-pyspark-kubernetes.html
-
Essential Data Science Tips: How to Use One-Vs-Rest and One-Vs-One for Multi-Class Classification
Classification, as a predictive model, involves aligning each class label to examples. Algorithms designed for binary classification cannot be applied to multi-class classification problems. For such situations, heuristic methods come in handy.https://www.kdnuggets.com/2020/08/one-vs-rest-one-multi-class-classification.html
-
The analysis is done from 1000+ recent Data scientist jobs, extracted from job portals using web scraping.Know What Employers are Expecting for a Data Scientist Role in 2020">
Know What Employers are Expecting for a Data Scientist Role in 2020
https://www.kdnuggets.com/2020/08/employers-expecting-data-scientist-role-2020.html