Search results for "s3"

376 documents found out of 7194 total.

  • Analytics Patterns Every Data Scientist Should Master

    Learn the analytics pattern you can use in most business analytics tasks.

    https://www.kdnuggets.com/analytics-patterns-every-data-scientist-should-master

  • 5 Self-Hosted Alternatives for Data Scientists in 2026

    Save money & take control in 2026. Discover 5 powerful open-source, self-hosted tools to replace costly subscriptions for data scientists.

    https://www.kdnuggets.com/5-self-hosted-alternatives-for-data-scientists-in-2026

  • Data Lake vs Data Warehouse vs Lakehouse vs Data Mesh: What’s the Difference?

    Data Lake vs Data Warehouse vs Lakehouse vs Data Mesh explained simply. Learn the key differences and which architecture fits your data needs

    https://www.kdnuggets.com/data-lake-vs-data-warehouse-vs-lakehouse-vs-data-mesh-whats-the-difference

  • All About Google Colab File Management

    This is the ultimate guide to uploading, downloading, and saving files in Colab.

    https://www.kdnuggets.com/all-about-google-colab-file-management

  • Is Your Machine Learning Pipeline as Efficient as it Could Be?

    Here are five critical pipeline areas to audit, with practical strategies to reclaim your team’s time.

    https://www.kdnuggets.com/is-your-machine-learning-pipeline-as-efficient-as-it-could-be

  • We Used 3 Feature Selection Techniques: This One Worked Best

    Let's look at three feature selection techniques and see which one worked best.

    https://www.kdnuggets.com/we-used-3-feature-selection-techniques-this-one-worked-best

  • Ray or Dask? A Practical Guide for Data Scientists

    Ray and Dask are tools that help data scientists work faster by performing multiple tasks at the same time. This article will show you the main differences and help you pick the right one for machine learning projects.

    https://www.kdnuggets.com/ray-or-dask-a-practical-guide-for-data-scientists

  • 5 Free AI Courses from Hugging Face

    Hands-on, community driven courses on LLMs, AI agents, MCPs, diffusion models, and reinforcement learning.

    https://www.kdnuggets.com/5-free-ai-courses-from-hugging-face

  • Creating Slick Data Dashboards with Python, Taipy & Google Sheets

    Develop simple yet powerful business intelligence tools tailored to meet your company's specific needs.

    https://www.kdnuggets.com/creating-slick-data-dashboards-with-python-taipy-google-sheets

  • Writing Your First GPU Kernel in Python with Numba and CUDA

    80x Faster Python? Discover How One Line Turns Your Code Into a GPU Beast!

    https://www.kdnuggets.com/writing-your-first-gpu-kernel-in-python-with-numba-and-cuda

  • Best Web Scraping Companies in 2025

    Whether you’re looking for simple point-and-click solutions or hardcore APIs for scraping the entire web, this list offers something for everyone.

    https://www.kdnuggets.com/2025/07/oxylabs/best-web-scraping-companies-in-2025

  • Building End-to-End Data Pipelines: From Data Ingestion to Analysis

    Check out this practical guide to designing scalable, reliable, and insight-driven data infrastructure.

    https://www.kdnuggets.com/building-end-to-end-data-pipelines-from-data-ingestion-to-analysis

  • Large Language Models: A Self-Study Roadmap

    A complete beginner’s roadmap to understanding and building with large language models explained simply and with hands-on resources.

    https://www.kdnuggets.com/large-language-models-a-self-study-roadmap

  • 5 Fun Python Projects for Absolute Beginners

    Bored of theory? These hands-on Python projects make learning interactive, practical, and actually enjoyable.

    https://www.kdnuggets.com/5-fun-python-projects-for-absolute-beginners

  • Top 5 Frameworks for Distributed Machine Learning

    Use these frameworks to optimize memory and compute resources, scale your machine learning workflow, speed up your processes, and reduce the overall cost.

    https://www.kdnuggets.com/top-5-frameworks-for-distributed-machine-learning

  • Agentic AI: A Self-Study Roadmap

    A comprehensive guide to building AI systems that can plan, reason, and act autonomously — from basic tool-using agents to sophisticated multi-agent collaborations.

    https://www.kdnuggets.com/agentic-ai-a-self-study-roadmap

  • 7 AWS Services for Machine Learning Projects

    Learn about the AWS machine learning service that helps you build machine learning pipelines, from processing data to training and deploying models.

    https://www.kdnuggets.com/7-aws-services-for-machine-learning-projects

  • 7 Essential Ready-To-Use Data Engineering Docker Containers

    Ready to level up your data engineering game without wasting hours on setup? From ingestion to orchestration, these Docker containers handle it all.

    https://www.kdnuggets.com/7-essential-ready-to-use-data-engineering-docker-containers

  • Top 5 Career Paths in Data Science and How to Self-Learn for Each

    Are you a self-learner wanting to break into one of the top 5 data science career paths? If yes, this article is for you.

    https://www.kdnuggets.com/top-5-career-paths-in-data-science-and-how-to-self-learn-for-each

  • Using fsspec for Unified File Management in Your Python Projects

    Are you looking for an easier way to manage files across different storage systems? fsspec is a Python library that simplifies file handling by providing a unified interface for file management.

    https://www.kdnuggets.com/fsspec-unified-file-management-python-projects

  • 7 Projects to Master Data Engineering

    Learn to build, run, and manage data engineering pipelines both locally and in the cloud using popular tools.

    https://www.kdnuggets.com/7-projects-master-data-engineering

  • DIY AI: Building Your AI Apps on a Shoestring Budget

    Artificial Intelligence has become an integral part of modern software applications, as it is known to add extended functionalities to traditional applications. This tutorial will guide you on a simplistic approach to building an AI application.

    https://www.kdnuggets.com/diy-ai-building-ai-apps-shoestring-budget

  • Math Myths Busted: What Beginners Actually Need for Data Science

    Terrified of calculus but dream of being a data scientist? Breathe easy! Discover the surprising truth about math in data science and how you can succeed without being a math genius.

    https://www.kdnuggets.com/math-myths-busted-beginners-actually-need-data-science

  • How to Become a Software Engineer (Without a Degree)

    The fastest and simplest route to becoming a software engineer with little cost.

    https://www.kdnuggets.com/how-to-become-a-software-engineer-without-a-degree

  • Incogni Review 2025: Features, Pros & Cons

    Incogni is a privacy tool designed to automatically remove personal data from data brokers, reducing the risk of identity theft, spam, and misuse. It simplifies the complex task of managing data removal requests by automating the process and monitoring over 180 data brokers. Key benefits include saving time, providing legal backing through privacy laws like GDPR and CCPA, and offering an easy-to-use dashboard for tracking progress. It’s ideal for those concerned about protecting their online privacy without manual effort.

    https://www.kdnuggets.com/review/incogni-review-2024-features-pros-cons

  • Freshservice IT Service Management Review 2025: Features, Pros & Cons

    Freshservice, developed by Freshworks, is a cloud-based IT Service Management (ITSM) tool designed for managing IT operations and services. Its key features include incident management, change management, asset tracking, and automation. With an intuitive interface, it simplifies IT workflows, enhances service delivery, and ensures compliance through robust reporting. Freshservice is ideal for businesses of all sizes seeking a scalable, user-friendly ITSM solution with automation and integration capabilities.

    https://www.kdnuggets.com/review/freshservice-it-service-management-review-2024-features-pros-cons

  • Monday.com CRM Review 2025: Features, Pricing, Pros & Cons

    Introduction The customer relationship management (CRM) software landscape is vast and choosing the best one for your business can seem like a daunting task. Over Read more »

    https://www.kdnuggets.com/review/monday-com-crm-review-2024-features-pricing-pros-cons

  • Scalability Challenges & Strategies in Data Science

    Scaling data science projects can be difficult. This article explores challenges and strategies for managing large-scale data.

    https://www.kdnuggets.com/scalability-challenges-strategies-in-data-science

  • Project Ideas to Master Data Engineering

    Data engineering is best learned by doing projects. But which ones? Here are six projects focusing on different data engineering skills to ensure you have it all covered.

    https://www.kdnuggets.com/project-ideas-to-master-data-engineering

  • How to Use NumPy to Solve Systems of Nonlinear Equations

    In this article, we’ll explore how to leverage NumPy to solve systems of nonlinear equations, turning complex mathematical challenges into manageable tasks.

    https://www.kdnuggets.com/how-to-use-numpy-to-solve-systems-of-nonlinear-equations

  • Tools Every AI Engineer Should Know: A Practical Guide

    Explore essential tools and skills for AI engineers: Python, R, big data frameworks, and cloud services essential for building and optimizing AI systems.

    https://www.kdnuggets.com/tools-every-ai-engineer-should-know-a-practical-guide

  • 10 GitHub Repositories to Master Data Engineering

    Learn data engineering through free courses, tutorials, books, tools, guides, roadmaps, practice exercises, projects, and other resources.

    https://www.kdnuggets.com/10-github-repositories-to-master-data-engineering

  • New Tech Courses That Have Just Landed

    Check out these 3 courses that have just landed, from front-end web development to project management.

    https://www.kdnuggets.com/new-tech-courses-that-have-just-landed

  • 2024 Reading List: 5 Essential Reads on Artificial Intelligence

    Transform your understanding of current and future tech with these top 5 AI reads to explore the minds shaping our future.

    https://www.kdnuggets.com/2024-reading-list-5-essential-reads-on-artificial-intelligence

  • Natural Language Processing: Bridging Human Communication with AI

    The post highlights real-world examples of NLP use cases across industries. It also covers NLP's objectives, challenges, and latest research developments.

    https://www.kdnuggets.com/natural-language-processing-bridging-human-communication-with-ai

  • Turn Your Laptop Into a Personal Analytics Engine with DuckDB and MotherDuck

    Bring the powerful tools to your laptop.

    https://www.kdnuggets.com/turn-your-laptop-into-a-personal-analytics-engine-with-duckdb-and-motherduck

  • Using Lightning AI Studio For Free

    Use Lightning AI Cloud IDE for free to experiment, train, and deploy your AI models.

    https://www.kdnuggets.com/using-lightning-ai-studio-for-free

  • The Top 5 Alternatives to GitHub for Data Science Projects

    The blog discusses five platforms designed for data scientists with specialized capabilities in managing large datasets, models, workflows, and collaboration beyond what GitHub offers.

    https://www.kdnuggets.com/the-top-5-alternatives-to-github-for-data-science-projects

  • A Comprehensive List of Resources to Master Large Language Models

    Large Language Models (LLMs) have now become an integral part of various applications. This article provides an extensive list of resources for anyone interested to dive into the world of LLMs.

    https://www.kdnuggets.com/a-comprehensive-list-of-resources-to-master-large-language-models

  • Optimizing Data Analytics: Integrating GitHub Copilot in Databricks

    Integrating AI-powered pair programming tools for data analytics in Databricks optimizes and streamlines the development process, freeing up developer time for innovation.

    https://www.kdnuggets.com/optimizing-data-analytics-integrating-github-copilot-in-databricks

  • Greening AI: 7 Strategies to Make Applications More Sustainable

    The article delves into a comprehensive methodology that sheds light on how to accurately estimate the carbon footprint associated with AI applications. It explains the environmental impact of AI, a crucial consideration in today's world.

    https://www.kdnuggets.com/greening-ai-7-strategies-to-make-applications-more-sustainable

  • 7 Best Cloud Database Platforms

    Cloud databases have made it easier and cheaper to develop enterprise-level applications, offering flexibility, convenience, and standard database functionality. See what KDnuggets recommends.

    https://www.kdnuggets.com/7-best-cloud-database-platforms

  • 7 Steps to Mastering Large Language Models (LLMs)

    Large Language Models (LLMs) have unlocked a new era in natural language processing. So why not learn more about them? Go from learning what large language models are to building and deploying LLM apps in 7 easy steps with this guide.

    https://www.kdnuggets.com/7-steps-to-mastering-large-language-models-llms

  • Best Practices for Building ETLs for ML

    This article talks about several best practices for writing ETLs for building training datasets. It delves into several software engineering techniques and patterns applied to ML.

    https://www.kdnuggets.com/best-practices-for-building-etls-for-ml

  • The Top 5 Data Management Tools For Your Projects

    See what KDnuggets is recommending for the top 5 cutting-edge tools for cloud, ETL, transformation, master data management, and visualization.

    https://www.kdnuggets.com/top-5-data-management-tools-for-your-projects

  • Deploying Your Machine Learning Model to Production in the Cloud

    Learn a simple way to have a live model hosted on AWS.

    https://www.kdnuggets.com/deploying-your-ml-model-to-production-in-the-cloud

  • Don’t Miss Out! Enroll in FREE Courses Before 2023 Ends

    Complete the last quarter of the year and improve your skills to get you kickstarted for 2024’s self-development plan with these FREE courses.

    https://www.kdnuggets.com/dont-miss-out-enroll-in-free-courses-before-2023-ends

  • KDnuggets Survey: Benchmark With Your Peers On Data Science Spend & Trends 2023 H2

    KDnuggets, along with The All Things Insights Survey Committee and its partners, have created a Spend & Trends survey to provide you and your colleagues in our community with much needed benchmarking information on mindset and focus trends as well as budget and technology spend.

    https://www.kdnuggets.com/kdnuggets-survey-benchmark-peers-data-science-spends-trends

  • Working with Big Data: Tools and Techniques

    Where do you start in a field as vast as big data? Which tools and techniques to use? We explore this and talk about the most common tools in big data.

    https://www.kdnuggets.com/working-with-big-data-tools-and-techniques

  • Who Will Make Money from the Generative AI Gold Rush?

    Buckle up for the Generative AI gold rush! Will BigTech rule with its picks and shovels? Which startups will strike it rich? Will “copilot for X” be the business strategy to hit pay dirt? How can startups dig moats to keep out other prospectors? And will the US once again have the richest gold seams?

    https://www.kdnuggets.com/2023/08/make-money-generative-ai-gold-rush.html

  • How to Ace Data Scientist Professional Certificate Exam

    Gain insights into the certification process and expert tips for passing the certificate exam.

    https://www.kdnuggets.com/2023/08/ace-data-scientist-professional-certificate.html

  • The Best Courses for AI from Universities with YouTube Playlists

    Kickstart a new career or develop your current one with these YouTube playlists by trusted Universities!.

    https://www.kdnuggets.com/2023/08/best-courses-ai-universities-youtube-playlists.html

  • A Comprehensive Guide to MLOps

    Machine Learning Operations (MLOps) is a relatively new discipline that provides the structure and support necessary for machine learning (ML) models to thrive in production environments.

    https://www.kdnuggets.com/2023/08/comprehensive-guide-mlops.html

  • Mastering GPUs: A Beginner’s Guide to GPU-Accelerated DataFrames in Python

    RAPIDS cuDF, with its pandas-like API, enables data scientists and engineers to quickly tap into the immense potential of parallel computing on GPUs–with just a few code line changes. Read on for more.

    https://www.kdnuggets.com/2023/07/mastering-gpus-beginners-guide-gpu-accelerated-dataframes-python.html

  • How to Build a Streaming Semi-structured Analytics Platform on Snowflake

    Building a datalake for semi-structured data or json has always been challenging. Imagine if the json documents are streaming or continuously flowing from healthcare vendors then we need a robust modern architecture that can deal with such a high volume. At the same time analytics layer also needs to be created so as to generate value from it.

    https://www.kdnuggets.com/2023/07/build-streaming-semistructured-analytics-platform-snowflake.html

  • How to Optimize SQL Queries for Faster Data Retrieval

    Today, we’ll talk about why SQL query optimization is important and which techniques can be used to optimize it.

    https://www.kdnuggets.com/2023/06/optimize-sql-queries-faster-data-retrieval.html

  • Advanced Feature Selection Techniques for Machine Learning Models

    Mastering Feature Selection: An Exploration of Advanced Techniques for Supervised and Unsupervised Machine Learning Models.

    https://www.kdnuggets.com/2023/06/advanced-feature-selection-techniques-machine-learning-models.html

  • RedPajama Project: An Open-Source Initiative to Democratizing LLMs

    Leading project to Empower the Community through Accessible Large Language Models.

    https://www.kdnuggets.com/2023/06/redpajama-project-opensource-initiative-democratizing-llms.html

  • Building and Training Your First Neural Network with TensorFlow and Keras

    Learn how to build and train your first Image Classification model with Keras and TensorFlow using Convolutional Neural Network.

    https://www.kdnuggets.com/2023/05/building-training-first-neural-network-tensorflow-keras.html

  • Schedule & Run ETLs with Jupysql and GitHub Actions

    This blog provided you with a comprehensive overview of ETL and JupySQL, including a brief introduction to ETLs and JupySQL. We also demonstrated how to schedule an example ETL notebook via GitHub actions, which allows you to automate the process of executing ETLs and JupySQL from Jupyter.

    https://www.kdnuggets.com/2023/05/schedule-run-etls-jupysql-github-actions.html

  • Fine-Tuning OpenAI Language Models with Noisily Labeled Data

    Reduce LLM prediction error by 37% via data-centric AI.

    https://www.kdnuggets.com/2023/04/finetuning-openai-language-models-noisily-labeled-data.html

  • 11 Best Practices of Cloud and Data Migration to AWS Cloud

    list of Best Practices compiled from our learnings during our migration journey to the AWS cloud.

    https://www.kdnuggets.com/2023/04/11-best-practices-cloud-data-migration-aws-cloud.html

  • 8 Open-Source Alternative to ChatGPT and Bard

    Discover the widely-used open-source frameworks and models for creating your ChatGPT like chatbots, integrating LLMs, or launching your AI product.

    https://www.kdnuggets.com/2023/04/8-opensource-alternative-chatgpt-bard.html

  • Top Free Courses on Large Language Models

    Interested in learning how ChatGPT and other AI chatbots work under the hood? Look no further. Check out these free courses and resources on large language models from Stanford, Princeton, ETH, and more.

    https://www.kdnuggets.com/2023/03/top-free-courses-large-language-models.html

  • 5 Data Analysis Projects For Beginners

    Are you a data analyst newbie looking to boost your resume to land your first job? If yes, then up your game as a beginner with these 5 projects that you can’t afford to miss.

    https://www.kdnuggets.com/2023/02/5-data-analysis-projects-beginners.html

  • Learn Data Engineering From These GitHub Repositories

    KDnuggets Top Blog Kickstart your Data Engineering career with these curated GitHub repositories.

    https://www.kdnuggets.com/2023/02/learn-data-engineering-github-repositories.html

  • KDnuggets Survey: Benchmark with your peers on industry spend and trends

    KDnuggets and its partners have just released a Spend & Trends survey to provide you the opportunity to benchmark with your peers on how folks are spending and the mindsets around current trends.

    https://www.kdnuggets.com/2023/02/kdnuggets-survey-industry-spend-trends.html

  • Scaling Data Management Through Apache Gobblin

    Software companies can manage big data at a hyper-scale on different infrastructure stacks using Apache Gobblin.

    https://www.kdnuggets.com/2023/01/scaling-data-management-apache-gobblin.html

  • Overcome Your Data Quality Issues with Great Expectations

    Bad data costs organizations money, reputation, and time. Hence it is very important to monitor and validate data quality continuously.

    https://www.kdnuggets.com/2023/01/overcome-data-quality-issues-great-expectations.html

  • Beginner’s Guide to Cloud Computing

    Learn how cloud computing works, different types of models, top cloud platforms, and applications.

    https://www.kdnuggets.com/2023/01/beginner-guide-cloud-computing.html

  • 7 Super Cheat Sheets You Need To Ace Machine Learning Interview

    KDnuggets Top Blog Revise the concepts of machine learning algorithms, frameworks, and methodologies to ace the technical interview round.

    https://www.kdnuggets.com/2022/12/7-super-cheat-sheets-need-ace-machine-learning-interview.html

  • Top Data Analyst Certification Courses for 2022

    Top certification courses by IBM, Edureka, DataCamp, Udacity, and Google.

    https://www.kdnuggets.com/2022/11/top-data-analyst-certification-courses-2022.html

  • Top 10 MLOps Tools to Optimize & Manage Machine Learning Lifecycle

    As more businesses experiment with data, they realize that developing a machine learning (ML) model is only one of many steps in the ML lifecycle.

    https://www.kdnuggets.com/2022/10/top-10-mlops-tools-optimize-manage-machine-learning-lifecycle.html

  • Is OLAP Dead?

    OLAP enables citizen analysts to quickly, efficiently, and cost-effectively uncover new business insights at a reduced time-to-value.

    https://www.kdnuggets.com/2022/10/olap-dead.html

  • Essential Books You Need to Become a Data Engineer

    KDnuggets Top Blog In this article, I will go through the roadmap of books you need to become a Data Engineer.

    https://www.kdnuggets.com/2022/10/essential-books-need-become-data-engineer.html

  • 10 Cheat Sheets You Need To Ace Data Science Interview

    KDnuggets Top Blog The only cheat you need for a job interview and data professional life. It includes SQL, web scraping, statistics, data wrangling and visualization, business intelligence, machine learning, deep learning, NLP, and super cheat sheets.

    https://www.kdnuggets.com/2022/10/10-cheat-sheets-need-ace-data-science-interview.html

  • Free Algorithms in Python Course

    KDnuggets Top Blog Algorithms are an often misunderstood concept. Leverage Python to learn what algorithms really are, and how to implement an array of basic computational algorithms in the language.

    https://www.kdnuggets.com/2022/09/free-algorithms-python-course.html

  • Python String Processing Cheatsheet

    Try this string processing primer cheatsheet to gain an understanding of using Python to manipulate and process strings at a basic level.

    https://www.kdnuggets.com/2020/01/python-string-processing-primer.html

  • Generate Synthetic Time-series Data with Open-source Tools

    An introduction to the generative adversarial network model DoppelGANger, and how you can use a new open-source PyTorch implementation of it to create high-quality synthetic time-series data.

    https://www.kdnuggets.com/2022/06/generate-synthetic-timeseries-data-opensource-tools.html

  • Top Data Science Podcasts for 2022

    Here are some data science related podcasts to help you either grow your interest in the field, increase your current knowledge, or help you develop yourself.

    https://www.kdnuggets.com/2022/06/top-data-science-podcasts-2022.html

  • How To Structure a Data Science Project: A Step-by-Step Guide

    Check out all the necessary steps to successfully structure your data science projects leveraging data science templates.

    https://www.kdnuggets.com/2022/05/structure-data-science-project-stepbystep-guide.html

  • MLOps Is a Mess But That’s to be Expected

    In this post, I want to focus the discussion about the state of machine learning operations (MLOps) today, where we are, where we are going.

    https://www.kdnuggets.com/2022/03/mlops-mess-expected.html

  • A New Way of Managing Deep Learning Datasets

    Create, version-control, query, and visualize image, audio, and video datasets using Hub 2.0 by Activeloop.

    https://www.kdnuggets.com/2022/03/new-way-managing-deep-learning-datasets.html

  • Feature Stores for Real-time AI & Machine Learning

    Real-time AI/ML is on the rise and feature stores are key to successfully deploying them. Read on to see how the choice of online store and the feature store architecture play important roles in determining its performance and cost.

    https://www.kdnuggets.com/2022/03/feature-stores-realtime-ai-machine-learning.html

  • Top 7 YouTube Courses on Data Analytics

    Learn data analytics by taking the best YouTube courses. These courses will cover data analysis with Python, R, SQL, PowerBI, Tableau, Excel, and SPSS.

    https://www.kdnuggets.com/2022/02/top-7-youtube-courses-data-analytics.html

  • Cloud Storage Adoption is the Need of the Hour for Business

    The rush towards cloud storage means that the cloud has to offer a valuable proposition to businesses. Let’s explore why businesses regardless of their size should consider moving to the cloud.

    https://www.kdnuggets.com/2022/02/cloud-storage-adoption-need-hour-business.html

  • Orchestrate a Data Science Project in Python With Prefect

    KDnuggets Top Blog Learn how to optimize your data science workflow in a few lines of code.

    https://www.kdnuggets.com/2022/02/orchestrate-data-science-project-python-prefect.html

  • The Complete Collection of Data Science Cheat Sheets – Part 2

    KDnuggets Top Blog A collection of cheat sheets that will help you prepare for a technical interview on Data Structures & Algorithms, Machine learning, Deep Learning, Natural Language Processing, Data Engineering, Web Frameworks.

    https://www.kdnuggets.com/2022/02/complete-collection-data-science-cheat-sheets-part-2.html

  • From Oracle to Databases for AI: The Evolution of Data Storage

    From Oracle, to NoSQL databases, and beyond, read about data management solutions from the early days of the RBDMS to those supporting AI applications.

    https://www.kdnuggets.com/2022/02/oracle-databases-ai-evolution-data-storage.html

  • The Complete Collection of Data Science Cheat Sheets – Part 1

    KDnuggets Top Blog A collection of cheat sheets that will help you prepare for a technical interview, assessment tests, class presentation, and help you revise core data science concepts.

    https://www.kdnuggets.com/2022/02/complete-collection-data-science-cheat-sheets-part-1.html

  • How to Set Up Your Data Science Stack on a Budget

    Whether you’re working independently or setting up a stack for a company, you need an affordable stack option. Here’s how you can set up your stack without spending too much.

    https://www.kdnuggets.com/2022/01/data-science-stack-budget.html

  • 6 Data Science Technologies You Need to Build Your Supply Chain Pipeline

    Here are some of the data science technologies needed to build a comprehensive and smooth supply chain pipeline.

    https://www.kdnuggets.com/2022/01/6-data-science-technologies-need-build-supply-chain-pipeline.html

  • How to Process a DataFrame with Millions of Rows in Seconds

    TLDR; process it with a new Python Data Processing Engine in the Cloud.

    https://www.kdnuggets.com/2022/01/process-dataframe-millions-rows-seconds.html

  • Using Datawig, an AWS Deep Learning Library for Missing Value Imputation

    A lot of missing values in the dataset can affect the quality of prediction in the long run. Several methods can be used to fill the missing values and Datawig is one of the most efficient ones.

    https://www.kdnuggets.com/2021/12/datawig-aws-deep-learning-library-missing-value-imputation.html

  • A Beginner’s Guide to End to End Machine Learning

    Learn to train, tune, deploy and monitor machine learning models.

    https://www.kdnuggets.com/2021/12/beginner-guide-end-end-machine-learning.html

  • What Comes After HDF5? Seeking a Data Storage Format for Deep Learning

    In this article we are discussing that HDF5 is one of the most popular and reliable formats for non-tabular, numerical data. But this format is not optimized for deep learning work. This article suggests what kind of ML native data format should be to truly serve the needs of modern data scientists.

    https://www.kdnuggets.com/2021/11/after-hdf5-data-storage-format-deep-learning.html

  • Design Patterns for Machine Learning Pipelines">Silver BlogDesign Patterns for Machine Learning Pipelines

    ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.

    https://www.kdnuggets.com/2021/11/design-patterns-machine-learning-pipelines.html

  • Advanced PyTorch Lightning with TorchMetrics and Lightning Flash

    In this tutorial we will be diving deeper into two additional tools you should be using: TorchMetrics and Lightning Flash. TorchMetrics unsurprisingly provides a modular approach to define and track useful metrics across batches and devices, while Lightning Flash offers a suite of functionality facilitating more efficient transfer learning and data handling, and a recipe book of state-of-the-art approaches to typical deep learning problems.

    https://www.kdnuggets.com/2021/11/advanced-pytorch-lightning-torchmetrics-lightning-flash.html

  • Is the Modern Data Stack Leaving You Behind?

    The modern data stack narrative is largely dominated by analytics engineering. Where does that leave data engineers? Discover the difference between the MDS for data engineers & analytics engineers.

    https://www.kdnuggets.com/2021/11/modern-data-stack-leaving-behind.html

  • ETL and ELT: A Guide and Market Analysis

    ETL and related techniques remain a powerful and foundational tool in the data industry. We explain what ETL is and how ETL and ELT processes have evolved over the years, with a close eye toward how third-generation ETL tools are about to disrupt standard data processing practices.

    https://www.kdnuggets.com/2021/10/etl-elt-guide-market-analysis.html

  • Deploying Serverless spaCy Transformer Model with AWS Lambda

    A step-by-step guide on how to deploy NER transformer model serverless.

    https://www.kdnuggets.com/2021/10/deploying-serverless-spacy-transformer-model-aws-lambda.html

  • Training BPE, WordPiece, and Unigram Tokenizers from Scratch using Hugging Face

    Comparing the tokens generated by SOTA tokenization algorithms using Hugging Face's tokenizers package.

    https://www.kdnuggets.com/2021/10/bpe-wordpiece-unigram-tokenizers-using-hugging-face.html

  • Serving ML Models in Production: Common Patterns

    Over the past couple years, we've seen 4 common patterns of machine learning in production: pipeline, ensemble, business logic, and online learning. In the ML serving space, implementing these patterns typically involves a tradeoff between ease of development and production readiness. Ray Serve was built to support these patterns by being both easy to develop and production ready.

    https://www.kdnuggets.com/2021/10/serving-ml-models-production-common-patterns.html

  • How our Obsession with Algorithms Broke Computer Vision: And how Synthetic Computer Vision can fix it

    Deep Learning radically improved Machine Learning as a whole. The Data-Centric revolution is about to do the same. In this post, we’ll take a look at the pitfalls of mainstream Computer Vision (CV) and discuss why Synthetic Computer Vision (SCV) is the future.

    https://www.kdnuggets.com/2021/10/obsession-algorithms-broke-computer-vision.html

  • Amazon Web Services Webinar: Leverage data sets to create a customer-centric strategy and improve business outcomes

    Register now for this webinar, Oct 28, to learn how using third-party data enhances applications to better prioritize your target customer - helping you build a more customer-centric business.

    https://www.kdnuggets.com/2021/10/roidna-aws-webinar-customer-centric-strategy.html

  • AutoML: An Introduction Using Auto-Sklearn and Auto-PyTorch

    AutoML is a broad category of techniques and tools for applying automated search to your automated search and learning to your learning. In addition to Auto-Sklearn, the Freiburg-Hannover AutoML group has also developed an Auto-PyTorch library. We’ll use both of these as our entry point into AutoML in the following simple tutorial.

    https://www.kdnuggets.com/2021/10/automl-introduction-auto-sklearn-auto-pytorch.html

  • The Evolution of Tokenization – Byte Pair Encoding in NLP

    Though we have SOTA algorithms for tokenization, it's always a good practice to understand the evolution trail and learning how have we reached here. Read this introduction to Byte Pair Encoding.

    https://www.kdnuggets.com/2021/10/evolution-tokenization-byte-pair-encoding-nlp.html

  • Surpassing Trillion Parameters and GPT-3 with Switch Transformers – a path to AGI?">Silver BlogSurpassing Trillion Parameters and GPT-3 with Switch Transformers – a path to AGI?

    Ever larger models churning on increasingly faster machines suggest a potential path toward smarter AI, such as with the massive GPT-3 language model. However, new, more lean, approaches are being conceived and explored that may rival these super-models, which could lead to a future with more efficient implementations of advanced AI-driven systems.

    https://www.kdnuggets.com/2021/10/trillion-parameters-gpt-3-switch-transformers-path-agi.html

  • Important Statistics Data Scientists Need to Know

    Several fundamental statistical concepts must be well appreciated by every data scientist -- from the enthusiast to the professional. Here, we provide code snippets in Python to increase understanding to bring you key tools that bring early insight into your data.

    https://www.kdnuggets.com/2021/09/important-statistics-data-scientists.html

  • Gold BlogPath to Full Stack Data Science">Rewards BlogGold BlogPath to Full Stack Data Science

    Start your journey toward mastering all aspects of the field of Data Science with this focused list of in-depth self-learning resources. Curated with the beginner in mind, these recommendations will help you learn efficiently, and can also offer existing professionals useful highlights for review or help filling in any gaps in skills.

    https://www.kdnuggets.com/2021/09/path-full-stack-data-science.html

  • Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV

    This article documents the authors' experience building their custom MLOps approach.

    https://www.kdnuggets.com/2021/09/adventures-mlops-github-actions-iterative-ai-label-studio-and-nbdev.html

  • The Machine & Deep Learning Compendium Open Book">Gold BlogThe Machine & Deep Learning Compendium Open Book

    After years in the making, this extensive and comprehensive ebook resource is now available and open for data scientists and ML engineers. Learn from and contribute to this tome of valuable information to support all your work in data science from engineering to strategy to management.

    https://www.kdnuggets.com/2021/09/machine-deep-learning-open-book.html

  • Amazon Web Services Webinar: Boost customer satisfaction and sales with consumer insights data

    Join this webinar, Sep 27, to learn how to leverage external data to understand market needs and consumer behavior – helping you build a more customer-centric business.

    https://www.kdnuggets.com/2021/09/roidna-aws-webinar-consumer-insights-data.html

  • Build a synthetic data pipeline using Gretel and Apache Airflow

    In this blog post, we build an ETL pipeline that generates synthetic data from a PostgreSQL database using Gretel’s Synthetic Data APIs and Apache Airflow.

    https://www.kdnuggets.com/2021/09/build-synthetic-data-pipeline-gretel-apache-airflow.html

  • Best Resources to Learn Natural Language Processing in 2021

    In this article, the author has listed listed all the best resources to learn natural language processing including Online Courses, Tutorials, Books, and YouTube Videos.

    https://www.kdnuggets.com/2021/09/best-resources-learn-natural-language-processing-2021.html

  • CSV Files for Storage? No Thanks. There’s a Better Option

    Saving data to CSV’s is costing you both money and disk space. It’s time to end it.

    https://www.kdnuggets.com/2021/08/csv-files-storage-better-option.html

  • Open Source Datasets for Computer Vision">Silver BlogOpen Source Datasets for Computer Vision

    Access to high-quality, noise-free, large-scale datasets is crucial for training complex deep neural network models for computer vision applications. Many open-source datasets are developed for use in image classification, pose estimation, image captioning, autonomous driving, and object segmentation. These datasets must be paired with the appropriate hardware and benchmarking strategies to optimize performance.

    https://www.kdnuggets.com/2021/08/open-source-datasets-computer-vision.html

  • Writing Your First Distributed Python Application with Ray

    Using Ray, you can take Python code that runs sequentially and transform it into a distributed application with minimal code changes. Read on to find out why you should use Ray, and how to get started.

    https://www.kdnuggets.com/2021/08/distributed-python-application-ray.html

  • Development & Testing of ETL Pipelines for AWS Locally

    Typically, development and testing ETL pipelines is done on real environment/clusters which is time consuming to setup & requires maintenance. This article focuses on the development and testing of ETL pipelines locally with the help of Docker & LocalStack. The solution gives flexibility to test in a local environment without setting up any services on the cloud.

    https://www.kdnuggets.com/2021/08/development-testing-etl-pipelines-aws-locally.html

  • dbt for Data Transformation – Hands-on Tutorial

    The data build tool (dbt) is gaining in popularity and use, and this hands-on tutorial covers creating complex models, using variables and functions, running tests, generating docs, and many more features.

    https://www.kdnuggets.com/2021/07/dbt-data-transformation-tutorial.html

  • Not Only for Deep Learning: How GPUs Accelerate Data Science & Data Analytics">Gold BlogNot Only for Deep Learning: How GPUs Accelerate Data Science & Data Analytics

    Modern AI/ML systems’ success has been critically dependent on their ability to process massive amounts of raw data in a parallel fashion using task-optimized hardware. Can we leverage the power of GPU and distributed computing for regular data processing jobs too?

    https://www.kdnuggets.com/2021/07/deep-learning-gpu-accelerate-data-science-data-analytics.html

  • Why and how should you learn “Productive Data Science”?">Gold BlogWhy and how should you learn “Productive Data Science”?

    What is Productive Data Science and what are some of its components?

    https://www.kdnuggets.com/2021/07/learn-productive-data-science.html

  • How to Use Kafka Connect to Create an Open Source Data Pipeline for Processing Real-Time Data

    This article shows you how to create a real-time data pipeline using only pure open source technologies. These include Kafka Connect, Apache Kafka, Kibana and more.

    https://www.kdnuggets.com/2021/07/kafka-open-source-data-pipeline-processing-real-time-data.html

  • When to Retrain an Machine Learning Model? Run these 5 checks to decide on the schedule

    Machine learning models degrade with time, and need to be regularly updated. In the article, we suggest how to approach retraining and plan for it in advance.

    https://www.kdnuggets.com/2021/07/retrain-machine-learning-model-5-checks-decide-schedule.html

  • Geometric foundations of Deep Learning">Gold BlogGeometric foundations of Deep Learning

    Geometric Deep Learning is an attempt for geometric unification of a broad class of machine learning problems from the perspectives of symmetry and invariance. These principles not only underlie the breakthrough performance of convolutional neural networks and the recent success of graph neural networks but also provide a principled way to construct new types of problem-specific inductive biases.

    https://www.kdnuggets.com/2021/07/geometric-foundations-deep-learning.html

  • A Lightning Fast Look at Single Line Exploratory Data Analysis

    Here's a very quick look at how you can perform EDA with a single line of code using D-Tale.

    https://www.kdnuggets.com/2021/07/single-line-exploratory-data-analysis.html

  • Learning Data Science Through Social Media

    Want your social media algorithms to show you actual algorithms? Spare a moment during your social media scrolling to learn a bit of data science. Here are suggestions for at-a-glance access to good ideas and tips on your favorite platforms.

    https://www.kdnuggets.com/2021/07/learning-data-science-through-social-media.html

  • Create and Deploy Dashboards using Voila and Saturn Cloud

    Working with and training large datasets, maintaining them all in one place, and deploying them to production is a challenging job. In this article, we covered what Saturn Cloud is and how it can speed up your end-to-end pipeline, how to create dashboards using Voila and Python and publish them to production in just a few easy steps.

    https://www.kdnuggets.com/2021/06/create-deploy-dashboards-voila-saturn-cloud.html

  • An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM)

    Understanding why your AI-based models make the decisions they do is crucial for deploying practical solutions in the real-world. Here, we review some techniques in the field of Explainable AI (XAI), why explainability is important, example models of explainable AI using LIME and SHAP, and demonstrate how Explainable Boosting Machines (EBMs) can make explainability even easier.

    https://www.kdnuggets.com/2021/06/explainable-ai-xai-explainable-boosting-machines-ebm.html

  • Facebook Launches One of the Toughest Reinforcement Learning Challenges in History

    The FAIR team just launched the NetHack Challenge as part of the upcoming NeurIPS 2021 competition. The objective is to test new RL ideas using a one of the toughest game environments in the world.

    https://www.kdnuggets.com/2021/06/facebook-launches-toughest-reinforcement-learning-challenges.html

1
Refine your search here:

Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

No, thanks!