Search results for s3

    Found 343 documents, 5930 searched:

  • 2024 Reading List: 5 Essential Reads on Artificial Intelligence

    Transform your understanding of current and future tech with these top 5 AI reads to explore the minds shaping our future.

    https://www.kdnuggets.com/2024-reading-list-5-essential-reads-on-artificial-intelligence

  • Natural Language Processing: Bridging Human Communication with AI

    The post highlights real-world examples of NLP use cases across industries. It also covers NLP's objectives, challenges, and latest research developments.

    https://www.kdnuggets.com/natural-language-processing-bridging-human-communication-with-ai

  • Turn Your Laptop Into a Personal Analytics Engine with DuckDB and MotherDuck

    Bring the powerful tools to your laptop.

    https://www.kdnuggets.com/turn-your-laptop-into-a-personal-analytics-engine-with-duckdb-and-motherduck

  • Using Lightning AI Studio For Free

    Use Lightning AI Cloud IDE for free to experiment, train, and deploy your AI models.

    https://www.kdnuggets.com/using-lightning-ai-studio-for-free

  • The Top 5 Alternatives to GitHub for Data Science Projects

    The blog discusses five platforms designed for data scientists with specialized capabilities in managing large datasets, models, workflows, and collaboration beyond what GitHub offers.

    https://www.kdnuggets.com/the-top-5-alternatives-to-github-for-data-science-projects

  • A Comprehensive List of Resources to Master Large Language Models

    Large Language Models (LLMs) have now become an integral part of various applications. This article provides an extensive list of resources for anyone interested to dive into the world of LLMs.

    https://www.kdnuggets.com/a-comprehensive-list-of-resources-to-master-large-language-models

  • Optimizing Data Analytics: Integrating GitHub Copilot in Databricks

    Integrating AI-powered pair programming tools for data analytics in Databricks optimizes and streamlines the development process, freeing up developer time for innovation.

    https://www.kdnuggets.com/optimizing-data-analytics-integrating-github-copilot-in-databricks

  • Greening AI: 7 Strategies to Make Applications More Sustainable

    The article delves into a comprehensive methodology that sheds light on how to accurately estimate the carbon footprint associated with AI applications. It explains the environmental impact of AI, a crucial consideration in today's world.

    https://www.kdnuggets.com/greening-ai-7-strategies-to-make-applications-more-sustainable

  • 7 Best Cloud Database Platforms

    Cloud databases have made it easier and cheaper to develop enterprise-level applications, offering flexibility, convenience, and standard database functionality. See what KDnuggets recommends.

    https://www.kdnuggets.com/7-best-cloud-database-platforms

  • 7 Steps to Mastering Large Language Models (LLMs)

    Large Language Models (LLMs) have unlocked a new era in natural language processing. So why not learn more about them? Go from learning what large language models are to building and deploying LLM apps in 7 easy steps with this guide.

    https://www.kdnuggets.com/7-steps-to-mastering-large-language-models-llms

  • Best Practices for Building ETLs for ML

    This article talks about several best practices for writing ETLs for building training datasets. It delves into several software engineering techniques and patterns applied to ML.

    https://www.kdnuggets.com/best-practices-for-building-etls-for-ml

  • The Top 5 Data Management Tools For Your Projects

    See what KDnuggets is recommending for the top 5 cutting-edge tools for cloud, ETL, transformation, master data management, and visualization.

    https://www.kdnuggets.com/top-5-data-management-tools-for-your-projects

  • Deploying Your Machine Learning Model to Production in the Cloud

    Learn a simple way to have a live model hosted on AWS.

    https://www.kdnuggets.com/deploying-your-ml-model-to-production-in-the-cloud

  • Don’t Miss Out! Enroll in FREE Courses Before 2023 Ends

    Complete the last quarter of the year and improve your skills to get you kickstarted for 2024’s self-development plan with these FREE courses.

    https://www.kdnuggets.com/dont-miss-out-enroll-in-free-courses-before-2023-ends

  • KDnuggets Survey: Benchmark With Your Peers On Data Science Spend & Trends 2023 H2

    KDnuggets, along with The All Things Insights Survey Committee and its partners, have created a Spend & Trends survey to provide you and your colleagues in our community with much needed benchmarking information on mindset and focus trends as well as budget and technology spend.

    https://www.kdnuggets.com/kdnuggets-survey-benchmark-peers-data-science-spends-trends

  • Working with Big Data: Tools and Techniques

    Where do you start in a field as vast as big data? Which tools and techniques to use? We explore this and talk about the most common tools in big data.

    https://www.kdnuggets.com/working-with-big-data-tools-and-techniques

  • Who Will Make Money from the Generative AI Gold Rush?

    Buckle up for the Generative AI gold rush! Will BigTech rule with its picks and shovels? Which startups will strike it rich? Will “copilot for X” be the business strategy to hit pay dirt? How can startups dig moats to keep out other prospectors? And will the US once again have the richest gold seams?

    https://www.kdnuggets.com/2023/08/make-money-generative-ai-gold-rush.html

  • How to Ace Data Scientist Professional Certificate Exam

    Gain insights into the certification process and expert tips for passing the certificate exam.

    https://www.kdnuggets.com/2023/08/ace-data-scientist-professional-certificate.html

  • The Best Courses for AI from Universities with YouTube Playlists

    Kickstart a new career or develop your current one with these YouTube playlists by trusted Universities!.

    https://www.kdnuggets.com/2023/08/best-courses-ai-universities-youtube-playlists.html

  • A Comprehensive Guide to MLOps

    Machine Learning Operations (MLOps) is a relatively new discipline that provides the structure and support necessary for machine learning (ML) models to thrive in production environments.

    https://www.kdnuggets.com/2023/08/comprehensive-guide-mlops.html

  • Mastering GPUs: A Beginner’s Guide to GPU-Accelerated DataFrames in Python

    RAPIDS cuDF, with its pandas-like API, enables data scientists and engineers to quickly tap into the immense potential of parallel computing on GPUs–with just a few code line changes. Read on for more.

    https://www.kdnuggets.com/2023/07/mastering-gpus-beginners-guide-gpu-accelerated-dataframes-python.html

  • How to Build a Streaming Semi-structured Analytics Platform on Snowflake

    Building a datalake for semi-structured data or json has always been challenging. Imagine if the json documents are streaming or continuously flowing from healthcare vendors then we need a robust modern architecture that can deal with such a high volume. At the same time analytics layer also needs to be created so as to generate value from it.

    https://www.kdnuggets.com/2023/07/build-streaming-semistructured-analytics-platform-snowflake.html

  • How to Optimize SQL Queries for Faster Data Retrieval

    Today, we’ll talk about why SQL query optimization is important and which techniques can be used to optimize it.

    https://www.kdnuggets.com/2023/06/optimize-sql-queries-faster-data-retrieval.html

  • Advanced Feature Selection Techniques for Machine Learning Models

    Mastering Feature Selection: An Exploration of Advanced Techniques for Supervised and Unsupervised Machine Learning Models.

    https://www.kdnuggets.com/2023/06/advanced-feature-selection-techniques-machine-learning-models.html

  • RedPajama Project: An Open-Source Initiative to Democratizing LLMs

    Leading project to Empower the Community through Accessible Large Language Models.

    https://www.kdnuggets.com/2023/06/redpajama-project-opensource-initiative-democratizing-llms.html

  • Building and Training Your First Neural Network with TensorFlow and Keras

    Learn how to build and train your first Image Classification model with Keras and TensorFlow using Convolutional Neural Network.

    https://www.kdnuggets.com/2023/05/building-training-first-neural-network-tensorflow-keras.html

  • Schedule & Run ETLs with Jupysql and GitHub Actions

    This blog provided you with a comprehensive overview of ETL and JupySQL, including a brief introduction to ETLs and JupySQL. We also demonstrated how to schedule an example ETL notebook via GitHub actions, which allows you to automate the process of executing ETLs and JupySQL from Jupyter.

    https://www.kdnuggets.com/2023/05/schedule-run-etls-jupysql-github-actions.html

  • Fine-Tuning OpenAI Language Models with Noisily Labeled Data

    Reduce LLM prediction error by 37% via data-centric AI.

    https://www.kdnuggets.com/2023/04/finetuning-openai-language-models-noisily-labeled-data.html

  • 11 Best Practices of Cloud and Data Migration to AWS Cloud

    list of Best Practices compiled from our learnings during our migration journey to the AWS cloud.

    https://www.kdnuggets.com/2023/04/11-best-practices-cloud-data-migration-aws-cloud.html

  • 8 Open-Source Alternative to ChatGPT and Bard

    Discover the widely-used open-source frameworks and models for creating your ChatGPT like chatbots, integrating LLMs, or launching your AI product.

    https://www.kdnuggets.com/2023/04/8-opensource-alternative-chatgpt-bard.html

  • Top Free Courses on Large Language Models

    Interested in learning how ChatGPT and other AI chatbots work under the hood? Look no further. Check out these free courses and resources on large language models from Stanford, Princeton, ETH, and more.

    https://www.kdnuggets.com/2023/03/top-free-courses-large-language-models.html

  • 5 Data Analysis Projects For Beginners

    Are you a data analyst newbie looking to boost your resume to land your first job? If yes, then up your game as a beginner with these 5 projects that you can’t afford to miss.

    https://www.kdnuggets.com/2023/02/5-data-analysis-projects-beginners.html

  • Learn Data Engineering From These GitHub Repositories

    KDnuggets Top Blog Kickstart your Data Engineering career with these curated GitHub repositories.

    https://www.kdnuggets.com/2023/02/learn-data-engineering-github-repositories.html

  • KDnuggets Survey: Benchmark with your peers on industry spend and trends

    KDnuggets and its partners have just released a Spend & Trends survey to provide you the opportunity to benchmark with your peers on how folks are spending and the mindsets around current trends.

    https://www.kdnuggets.com/2023/02/kdnuggets-survey-industry-spend-trends.html

  • Scaling Data Management Through Apache Gobblin

    Software companies can manage big data at a hyper-scale on different infrastructure stacks using Apache Gobblin.

    https://www.kdnuggets.com/2023/01/scaling-data-management-apache-gobblin.html

  • Overcome Your Data Quality Issues with Great Expectations

    Bad data costs organizations money, reputation, and time. Hence it is very important to monitor and validate data quality continuously.

    https://www.kdnuggets.com/2023/01/overcome-data-quality-issues-great-expectations.html

  • Beginner’s Guide to Cloud Computing

    Learn how cloud computing works, different types of models, top cloud platforms, and applications.

    https://www.kdnuggets.com/2023/01/beginner-guide-cloud-computing.html

  • 7 Super Cheat Sheets You Need To Ace Machine Learning Interview

    KDnuggets Top Blog Revise the concepts of machine learning algorithms, frameworks, and methodologies to ace the technical interview round.

    https://www.kdnuggets.com/2022/12/7-super-cheat-sheets-need-ace-machine-learning-interview.html

  • Top Data Analyst Certification Courses for 2022

    Top certification courses by IBM, Edureka, DataCamp, Udacity, and Google.

    https://www.kdnuggets.com/2022/11/top-data-analyst-certification-courses-2022.html

  • Top 10 MLOps Tools to Optimize & Manage Machine Learning Lifecycle

    As more businesses experiment with data, they realize that developing a machine learning (ML) model is only one of many steps in the ML lifecycle.

    https://www.kdnuggets.com/2022/10/top-10-mlops-tools-optimize-manage-machine-learning-lifecycle.html

  • Is OLAP Dead?

    OLAP enables citizen analysts to quickly, efficiently, and cost-effectively uncover new business insights at a reduced time-to-value.

    https://www.kdnuggets.com/2022/10/olap-dead.html

  • Essential Books You Need to Become a Data Engineer

    KDnuggets Top Blog In this article, I will go through the roadmap of books you need to become a Data Engineer.

    https://www.kdnuggets.com/2022/10/essential-books-need-become-data-engineer.html

  • 10 Cheat Sheets You Need To Ace Data Science Interview

    KDnuggets Top Blog The only cheat you need for a job interview and data professional life. It includes SQL, web scraping, statistics, data wrangling and visualization, business intelligence, machine learning, deep learning, NLP, and super cheat sheets.

    https://www.kdnuggets.com/2022/10/10-cheat-sheets-need-ace-data-science-interview.html

  • Free Algorithms in Python Course

    KDnuggets Top Blog Algorithms are an often misunderstood concept. Leverage Python to learn what algorithms really are, and how to implement an array of basic computational algorithms in the language.

    https://www.kdnuggets.com/2022/09/free-algorithms-python-course.html

  • Python String Processing Cheatsheet

    Try this string processing primer cheatsheet to gain an understanding of using Python to manipulate and process strings at a basic level.

    https://www.kdnuggets.com/2020/01/python-string-processing-primer.html

  • Generate Synthetic Time-series Data with Open-source Tools

    An introduction to the generative adversarial network model DoppelGANger, and how you can use a new open-source PyTorch implementation of it to create high-quality synthetic time-series data.

    https://www.kdnuggets.com/2022/06/generate-synthetic-timeseries-data-opensource-tools.html

  • Top Data Science Podcasts for 2022

    Here are some data science related podcasts to help you either grow your interest in the field, increase your current knowledge, or help you develop yourself.

    https://www.kdnuggets.com/2022/06/top-data-science-podcasts-2022.html

  • How To Structure a Data Science Project: A Step-by-Step Guide

    Check out all the necessary steps to successfully structure your data science projects leveraging data science templates.

    https://www.kdnuggets.com/2022/05/structure-data-science-project-stepbystep-guide.html

  • MLOps Is a Mess But That’s to be Expected

    In this post, I want to focus the discussion about the state of machine learning operations (MLOps) today, where we are, where we are going.

    https://www.kdnuggets.com/2022/03/mlops-mess-expected.html

  • A New Way of Managing Deep Learning Datasets

    Create, version-control, query, and visualize image, audio, and video datasets using Hub 2.0 by Activeloop.

    https://www.kdnuggets.com/2022/03/new-way-managing-deep-learning-datasets.html

  • Feature Stores for Real-time AI & Machine Learning

    Real-time AI/ML is on the rise and feature stores are key to successfully deploying them. Read on to see how the choice of online store and the feature store architecture play important roles in determining its performance and cost.

    https://www.kdnuggets.com/2022/03/feature-stores-realtime-ai-machine-learning.html

  • Top 7 YouTube Courses on Data Analytics

    Learn data analytics by taking the best YouTube courses. These courses will cover data analysis with Python, R, SQL, PowerBI, Tableau, Excel, and SPSS.

    https://www.kdnuggets.com/2022/02/top-7-youtube-courses-data-analytics.html

  • Cloud Storage Adoption is the Need of the Hour for Business

    The rush towards cloud storage means that the cloud has to offer a valuable proposition to businesses. Let’s explore why businesses regardless of their size should consider moving to the cloud.

    https://www.kdnuggets.com/2022/02/cloud-storage-adoption-need-hour-business.html

  • Orchestrate a Data Science Project in Python With Prefect

    KDnuggets Top Blog Learn how to optimize your data science workflow in a few lines of code.

    https://www.kdnuggets.com/2022/02/orchestrate-data-science-project-python-prefect.html

  • The Complete Collection of Data Science Cheat Sheets – Part 2

    KDnuggets Top Blog A collection of cheat sheets that will help you prepare for a technical interview on Data Structures & Algorithms, Machine learning, Deep Learning, Natural Language Processing, Data Engineering, Web Frameworks.

    https://www.kdnuggets.com/2022/02/complete-collection-data-science-cheat-sheets-part-2.html

  • From Oracle to Databases for AI: The Evolution of Data Storage

    From Oracle, to NoSQL databases, and beyond, read about data management solutions from the early days of the RBDMS to those supporting AI applications.

    https://www.kdnuggets.com/2022/02/oracle-databases-ai-evolution-data-storage.html

  • The Complete Collection of Data Science Cheat Sheets – Part 1

    KDnuggets Top Blog A collection of cheat sheets that will help you prepare for a technical interview, assessment tests, class presentation, and help you revise core data science concepts.

    https://www.kdnuggets.com/2022/02/complete-collection-data-science-cheat-sheets-part-1.html

  • How to Set Up Your Data Science Stack on a Budget

    Whether you’re working independently or setting up a stack for a company, you need an affordable stack option. Here’s how you can set up your stack without spending too much.

    https://www.kdnuggets.com/2022/01/data-science-stack-budget.html

  • 6 Data Science Technologies You Need to Build Your Supply Chain Pipeline

    Here are some of the data science technologies needed to build a comprehensive and smooth supply chain pipeline.

    https://www.kdnuggets.com/2022/01/6-data-science-technologies-need-build-supply-chain-pipeline.html

  • How to Process a DataFrame with Millions of Rows in Seconds

    TLDR; process it with a new Python Data Processing Engine in the Cloud.

    https://www.kdnuggets.com/2022/01/process-dataframe-millions-rows-seconds.html

  • Using Datawig, an AWS Deep Learning Library for Missing Value Imputation

    A lot of missing values in the dataset can affect the quality of prediction in the long run. Several methods can be used to fill the missing values and Datawig is one of the most efficient ones.

    https://www.kdnuggets.com/2021/12/datawig-aws-deep-learning-library-missing-value-imputation.html

  • A Beginner’s Guide to End to End Machine Learning

    Learn to train, tune, deploy and monitor machine learning models.

    https://www.kdnuggets.com/2021/12/beginner-guide-end-end-machine-learning.html

  • What Comes After HDF5? Seeking a Data Storage Format for Deep Learning

    In this article we are discussing that HDF5 is one of the most popular and reliable formats for non-tabular, numerical data. But this format is not optimized for deep learning work. This article suggests what kind of ML native data format should be to truly serve the needs of modern data scientists.

    https://www.kdnuggets.com/2021/11/after-hdf5-data-storage-format-deep-learning.html

  • Design Patterns for Machine Learning Pipelines">Silver BlogDesign Patterns for Machine Learning Pipelines

    ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.

    https://www.kdnuggets.com/2021/11/design-patterns-machine-learning-pipelines.html

  • Advanced PyTorch Lightning with TorchMetrics and Lightning Flash

    In this tutorial we will be diving deeper into two additional tools you should be using: TorchMetrics and Lightning Flash. TorchMetrics unsurprisingly provides a modular approach to define and track useful metrics across batches and devices, while Lightning Flash offers a suite of functionality facilitating more efficient transfer learning and data handling, and a recipe book of state-of-the-art approaches to typical deep learning problems.

    https://www.kdnuggets.com/2021/11/advanced-pytorch-lightning-torchmetrics-lightning-flash.html

  • Is the Modern Data Stack Leaving You Behind?

    The modern data stack narrative is largely dominated by analytics engineering. Where does that leave data engineers? Discover the difference between the MDS for data engineers & analytics engineers.

    https://www.kdnuggets.com/2021/11/modern-data-stack-leaving-behind.html

  • ETL and ELT: A Guide and Market Analysis

    ETL and related techniques remain a powerful and foundational tool in the data industry. We explain what ETL is and how ETL and ELT processes have evolved over the years, with a close eye toward how third-generation ETL tools are about to disrupt standard data processing practices.

    https://www.kdnuggets.com/2021/10/etl-elt-guide-market-analysis.html

  • Deploying Serverless spaCy Transformer Model with AWS Lambda

    A step-by-step guide on how to deploy NER transformer model serverless.

    https://www.kdnuggets.com/2021/10/deploying-serverless-spacy-transformer-model-aws-lambda.html

  • Training BPE, WordPiece, and Unigram Tokenizers from Scratch using Hugging Face

    Comparing the tokens generated by SOTA tokenization algorithms using Hugging Face's tokenizers package.

    https://www.kdnuggets.com/2021/10/bpe-wordpiece-unigram-tokenizers-using-hugging-face.html

  • Serving ML Models in Production: Common Patterns

    Over the past couple years, we've seen 4 common patterns of machine learning in production: pipeline, ensemble, business logic, and online learning. In the ML serving space, implementing these patterns typically involves a tradeoff between ease of development and production readiness. Ray Serve was built to support these patterns by being both easy to develop and production ready.

    https://www.kdnuggets.com/2021/10/serving-ml-models-production-common-patterns.html

  • How our Obsession with Algorithms Broke Computer Vision: And how Synthetic Computer Vision can fix it

    Deep Learning radically improved Machine Learning as a whole. The Data-Centric revolution is about to do the same. In this post, we’ll take a look at the pitfalls of mainstream Computer Vision (CV) and discuss why Synthetic Computer Vision (SCV) is the future.

    https://www.kdnuggets.com/2021/10/obsession-algorithms-broke-computer-vision.html

  • Amazon Web Services Webinar: Leverage data sets to create a customer-centric strategy and improve business outcomes

    Register now for this webinar, Oct 28, to learn how using third-party data enhances applications to better prioritize your target customer - helping you build a more customer-centric business.

    https://www.kdnuggets.com/2021/10/roidna-aws-webinar-customer-centric-strategy.html

  • AutoML: An Introduction Using Auto-Sklearn and Auto-PyTorch

    AutoML is a broad category of techniques and tools for applying automated search to your automated search and learning to your learning. In addition to Auto-Sklearn, the Freiburg-Hannover AutoML group has also developed an Auto-PyTorch library. We’ll use both of these as our entry point into AutoML in the following simple tutorial.

    https://www.kdnuggets.com/2021/10/automl-introduction-auto-sklearn-auto-pytorch.html

  • The Evolution of Tokenization – Byte Pair Encoding in NLP

    Though we have SOTA algorithms for tokenization, it's always a good practice to understand the evolution trail and learning how have we reached here. Read this introduction to Byte Pair Encoding.

    https://www.kdnuggets.com/2021/10/evolution-tokenization-byte-pair-encoding-nlp.html

  • Surpassing Trillion Parameters and GPT-3 with Switch Transformers – a path to AGI?">Silver BlogSurpassing Trillion Parameters and GPT-3 with Switch Transformers – a path to AGI?

    Ever larger models churning on increasingly faster machines suggest a potential path toward smarter AI, such as with the massive GPT-3 language model. However, new, more lean, approaches are being conceived and explored that may rival these super-models, which could lead to a future with more efficient implementations of advanced AI-driven systems.

    https://www.kdnuggets.com/2021/10/trillion-parameters-gpt-3-switch-transformers-path-agi.html

  • Important Statistics Data Scientists Need to Know

    Several fundamental statistical concepts must be well appreciated by every data scientist -- from the enthusiast to the professional. Here, we provide code snippets in Python to increase understanding to bring you key tools that bring early insight into your data.

    https://www.kdnuggets.com/2021/09/important-statistics-data-scientists.html

  • Gold BlogPath to Full Stack Data Science">Rewards BlogGold BlogPath to Full Stack Data Science

    Start your journey toward mastering all aspects of the field of Data Science with this focused list of in-depth self-learning resources. Curated with the beginner in mind, these recommendations will help you learn efficiently, and can also offer existing professionals useful highlights for review or help filling in any gaps in skills.

    https://www.kdnuggets.com/2021/09/path-full-stack-data-science.html

  • Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV

    This article documents the authors' experience building their custom MLOps approach.

    https://www.kdnuggets.com/2021/09/adventures-mlops-github-actions-iterative-ai-label-studio-and-nbdev.html

  • The Machine & Deep Learning Compendium Open Book">Gold BlogThe Machine & Deep Learning Compendium Open Book

    After years in the making, this extensive and comprehensive ebook resource is now available and open for data scientists and ML engineers. Learn from and contribute to this tome of valuable information to support all your work in data science from engineering to strategy to management.

    https://www.kdnuggets.com/2021/09/machine-deep-learning-open-book.html

  • Amazon Web Services Webinar: Boost customer satisfaction and sales with consumer insights data

    Join this webinar, Sep 27, to learn how to leverage external data to understand market needs and consumer behavior – helping you build a more customer-centric business.

    https://www.kdnuggets.com/2021/09/roidna-aws-webinar-consumer-insights-data.html

  • Build a synthetic data pipeline using Gretel and Apache Airflow

    In this blog post, we build an ETL pipeline that generates synthetic data from a PostgreSQL database using Gretel’s Synthetic Data APIs and Apache Airflow.

    https://www.kdnuggets.com/2021/09/build-synthetic-data-pipeline-gretel-apache-airflow.html

  • Best Resources to Learn Natural Language Processing in 2021

    In this article, the author has listed listed all the best resources to learn natural language processing including Online Courses, Tutorials, Books, and YouTube Videos.

    https://www.kdnuggets.com/2021/09/best-resources-learn-natural-language-processing-2021.html

  • CSV Files for Storage? No Thanks. There’s a Better Option

    Saving data to CSV’s is costing you both money and disk space. It’s time to end it.

    https://www.kdnuggets.com/2021/08/csv-files-storage-better-option.html

  • Open Source Datasets for Computer Vision">Silver BlogOpen Source Datasets for Computer Vision

    Access to high-quality, noise-free, large-scale datasets is crucial for training complex deep neural network models for computer vision applications. Many open-source datasets are developed for use in image classification, pose estimation, image captioning, autonomous driving, and object segmentation. These datasets must be paired with the appropriate hardware and benchmarking strategies to optimize performance.

    https://www.kdnuggets.com/2021/08/open-source-datasets-computer-vision.html

  • Writing Your First Distributed Python Application with Ray

    Using Ray, you can take Python code that runs sequentially and transform it into a distributed application with minimal code changes. Read on to find out why you should use Ray, and how to get started.

    https://www.kdnuggets.com/2021/08/distributed-python-application-ray.html

  • Development & Testing of ETL Pipelines for AWS Locally

    Typically, development and testing ETL pipelines is done on real environment/clusters which is time consuming to setup & requires maintenance. This article focuses on the development and testing of ETL pipelines locally with the help of Docker & LocalStack. The solution gives flexibility to test in a local environment without setting up any services on the cloud.

    https://www.kdnuggets.com/2021/08/development-testing-etl-pipelines-aws-locally.html

  • dbt for Data Transformation – Hands-on Tutorial

    The data build tool (dbt) is gaining in popularity and use, and this hands-on tutorial covers creating complex models, using variables and functions, running tests, generating docs, and many more features.

    https://www.kdnuggets.com/2021/07/dbt-data-transformation-tutorial.html

  • Not Only for Deep Learning: How GPUs Accelerate Data Science & Data Analytics">Gold BlogNot Only for Deep Learning: How GPUs Accelerate Data Science & Data Analytics

    Modern AI/ML systems’ success has been critically dependent on their ability to process massive amounts of raw data in a parallel fashion using task-optimized hardware. Can we leverage the power of GPU and distributed computing for regular data processing jobs too?

    https://www.kdnuggets.com/2021/07/deep-learning-gpu-accelerate-data-science-data-analytics.html

  • Why and how should you learn “Productive Data Science”?">Gold BlogWhy and how should you learn “Productive Data Science”?

    What is Productive Data Science and what are some of its components?

    https://www.kdnuggets.com/2021/07/learn-productive-data-science.html

  • How to Use Kafka Connect to Create an Open Source Data Pipeline for Processing Real-Time Data

    This article shows you how to create a real-time data pipeline using only pure open source technologies. These include Kafka Connect, Apache Kafka, Kibana and more.

    https://www.kdnuggets.com/2021/07/kafka-open-source-data-pipeline-processing-real-time-data.html

  • When to Retrain an Machine Learning Model? Run these 5 checks to decide on the schedule

    Machine learning models degrade with time, and need to be regularly updated. In the article, we suggest how to approach retraining and plan for it in advance.

    https://www.kdnuggets.com/2021/07/retrain-machine-learning-model-5-checks-decide-schedule.html

  • Geometric foundations of Deep Learning">Gold BlogGeometric foundations of Deep Learning

    Geometric Deep Learning is an attempt for geometric unification of a broad class of machine learning problems from the perspectives of symmetry and invariance. These principles not only underlie the breakthrough performance of convolutional neural networks and the recent success of graph neural networks but also provide a principled way to construct new types of problem-specific inductive biases.

    https://www.kdnuggets.com/2021/07/geometric-foundations-deep-learning.html

  • A Lightning Fast Look at Single Line Exploratory Data Analysis

    Here's a very quick look at how you can perform EDA with a single line of code using D-Tale.

    https://www.kdnuggets.com/2021/07/single-line-exploratory-data-analysis.html

  • Learning Data Science Through Social Media

    Want your social media algorithms to show you actual algorithms? Spare a moment during your social media scrolling to learn a bit of data science. Here are suggestions for at-a-glance access to good ideas and tips on your favorite platforms.

    https://www.kdnuggets.com/2021/07/learning-data-science-through-social-media.html

  • Create and Deploy Dashboards using Voila and Saturn Cloud

    Working with and training large datasets, maintaining them all in one place, and deploying them to production is a challenging job. In this article, we covered what Saturn Cloud is and how it can speed up your end-to-end pipeline, how to create dashboards using Voila and Python and publish them to production in just a few easy steps.

    https://www.kdnuggets.com/2021/06/create-deploy-dashboards-voila-saturn-cloud.html

  • An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM)

    Understanding why your AI-based models make the decisions they do is crucial for deploying practical solutions in the real-world. Here, we review some techniques in the field of Explainable AI (XAI), why explainability is important, example models of explainable AI using LIME and SHAP, and demonstrate how Explainable Boosting Machines (EBMs) can make explainability even easier.

    https://www.kdnuggets.com/2021/06/explainable-ai-xai-explainable-boosting-machines-ebm.html

  • Facebook Launches One of the Toughest Reinforcement Learning Challenges in History

    The FAIR team just launched the NetHack Challenge as part of the upcoming NeurIPS 2021 competition. The objective is to test new RL ideas using a one of the toughest game environments in the world.

    https://www.kdnuggets.com/2021/06/facebook-launches-toughest-reinforcement-learning-challenges.html

  • Get Interactive Plots Directly With Pandas">Silver BlogGet Interactive Plots Directly With Pandas

    Telling a story with data is a core function for any Data Scientist, and creating data visualizations that are simultaneously illuminating and appealing can be challenging. This tutorial reviews how to create Plotly and Bokeh plots directly through Pandas plotting syntax, which will help you convert static visualizations into interactive counterparts -- and take your analysis to the next level.

    https://www.kdnuggets.com/2021/06/interactive-plots-directly-pandas.html

  • Gold BlogHow I Doubled My Income with Data Science and Machine Learning">Rewards BlogGold BlogHow I Doubled My Income with Data Science and Machine Learning

    Many career opportunities exist in the ever-expanding domain of data. Finding your place -- and finding your salary -- is largely up to your dedication, focus, and drive to learn. If you are an aspiring Data Scientist or have already started your professional journey, there are multiple strategies for maximizing your earning potential.

    https://www.kdnuggets.com/2021/06/double-income-data-science-machine-learning.html

  • State of Mathematical Optimization Report, 2021

    Download your copy of Gurobi's first-ever "State of Mathematical Optimization Report," which is based on data from a survey of commercial mathematical optimization users. Get yours now.

    https://www.kdnuggets.com/2021/05/gurobi-state-mathematical-optimization-report-2021.html

  • Awesome list of datasets in 100+ categories

    With an estimated 44 zettabytes of data in existence in our digital world today and approximately 2.5 quintillion bytes of new data generated daily, there is a lot of data out there you could tap into for your data science projects. It's pretty hard to curate through such a massive universe of data, but this collection is a great start. Here, you can find data from cancer genomes to UFO reports, as well as years of air quality data to 200,000 jokes. Dive into this ocean of data to explore as you learn how to apply data science techniques or leverage your expertise to discover something new.

    https://www.kdnuggets.com/2021/05/awesome-list-datasets.html

  • Animated Bar Chart Races in Python

    A quick and step-by-step beginners project to create an animation bar graph for an amazing Covid dataset.

    https://www.kdnuggets.com/2021/05/animated-race-bar-charts-python.html

  • Super Charge Python with Pandas on GPUs Using Saturn Cloud

    Saturn Cloud is a tool that allows you to have 10 hours of free GPU computing and 3 hours of Dask Cluster computing a month for free. In this tutorial, you will learn how to use these free resources to process data using Pandas on a GPU. The experiments show that Pandas is over 1,000,000% slower on a CPU as compared to running Pandas on a Dask cluster of GPUs.

    https://www.kdnuggets.com/2021/05/super-charge-python-pandas-gpus-saturn-cloud.html

  • Applying Python’s Explode Function to Pandas DataFrames">Silver BlogApplying Python’s Explode Function to Pandas DataFrames

    Read this applied Python method to solve the issue of accessing column by date/ year using the Pandas library and functions lambda(), list(), map() & explode().

    https://www.kdnuggets.com/2021/05/applying-pythons-explode-function-pandas-dataframes.html

  • 10 Must-Know Statistical Concepts for Data Scientists

    Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.

    https://www.kdnuggets.com/2021/04/10-statistical-concepts-data-scientists.html

  • Deep Learning Recommendation Models (DLRM): A Deep Dive

    The currency in the 21st century is no longer just data. It's the attention of people. This deep dive article presents the architecture and deployment issues experienced with the deep learning recommendation model, DLRM, which was open-sourced by Facebook in March 2019.

    https://www.kdnuggets.com/2021/04/deep-learning-recommendation-models-dlrm-deep-dive.html

  • Automated Text Classification with EvalML

    Learn how EvalML leverages Woodwork, Featuretools and the nlp-primitives library to process text data and create a machine learning model that can detect spam text messages.

    https://www.kdnuggets.com/2021/04/automated-text-classification-evalml.html

  • How to deploy Machine Learning/Deep Learning models to the web">Gold BlogHow to deploy Machine Learning/Deep Learning models to the web

    The full value of your deep learning models comes from enabling others to use them. Learn how to deploy your model to the web and access it as a REST API, and begin to share the power of your machine learning development with the world.

    https://www.kdnuggets.com/2021/04/deploy-machine-learning-models-to-web.html

  • Top YouTube Machine Learning Channels

    These are the top 15 YouTube channels for machine learning as determined by our stated criteria, along with some additional data on the channels to help you decide if they may have some content useful for you.

    https://www.kdnuggets.com/2021/03/top-youtube-machine-learning-channels.html

  • DeepMind’s AlphaFold & the Protein Folding Problem

    Recently, DeepMind's AlphaFold made impressive headway in the protein structure prediction problem. Read this for an overview and explanation.

    https://www.kdnuggets.com/2021/03/deepmind-alphafold-protein-folding-problem.html

  • Dask and Pandas: No Such Thing as Too Much Data

    Do you love pandas, but don't love it when you reach the limits of your memory or compute resources? Dask provides you with the option to use the pandas API with distributed data and computing. Learn how it works, how to use it, and why it’s worth the switch when you need it most.

    https://www.kdnuggets.com/2021/03/dask-pandas-data.html

  • Machine Learning Systems Design: A Free Stanford Course">Gold BlogMachine Learning Systems Design: A Free Stanford Course

    This freely-available course from Stanford should give you a toolkit for designing machine learning systems.

    https://www.kdnuggets.com/2021/02/machine-learning-systems-design-free-stanford-course.html

  • The Difficulty of Graph Anonymisation

    Lessons from network science and the difficulty of graph anonymization. A data scientist's take on the difficultly of striking a balance between privacy and utility in anonymizing connected data.

    https://www.kdnuggets.com/2021/02/difficulty-graph-anonymisation.html

  • Powerful Exploratory Data Analysis in just two lines of code">Gold BlogPowerful Exploratory Data Analysis in just two lines of code

    EDA is a fundamental early process for any Data Science investigation. Typical approaches for visualization and exploration are powerful, but can be cumbersome for getting to the heart of your data. Now, you can get to know your data much faster with only a few lines of code... and it might even be fun!

    https://www.kdnuggets.com/2021/02/powerful-exploratory-data-analysis-sweetviz.html

  • Feature Store as a Foundation for Machine Learning

    With so many organizations now taking the leap into building production-level machine learning models, many lessons learned are coming to light about the supporting infrastructure. For a variety of important types of use cases, maintaining a centralized feature store is essential for higher ROI and faster delivery to market. In this review, the current feature store landscape is described, and you can learn how to architect one into your MLOps pipeline.

    https://www.kdnuggets.com/2021/02/feature-store-foundation-machine-learning.html

  • Past 2021 Meetings / Online Events on AI, Analytics, Big Data, Data Science, and Machine Learning

    Past | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec Read more »

    https://www.kdnuggets.com/meetings/past-meetings-2021.html

  • What is Graph Theory, and Why Should You Care?

    Go from graph theory to path optimization.

    https://www.kdnuggets.com/2021/01/graph-theory-why-care.html

  • Machine learning is going real-time

    Extracting immediate predictions from machine learning algorithms on the spot based on brand-new data can offer a next level of interaction and potential value to its consumers. The infrastructure and tech stack required to implement such real-time systems is also next level, and many organizations -- especially in the US -- seem to be resisting. But, what even is real-time ML, and how can it deliver a better experience?

    https://www.kdnuggets.com/2021/01/machine-learning-real-time.html

  • The Ultimate Scikit-Learn Machine Learning Cheatsheet">Gold BlogThe Ultimate Scikit-Learn Machine Learning Cheatsheet

    With the power and popularity of the scikit-learn for machine learning in Python, this library is a foundation to any practitioner's toolset. Preview its core methods with this review of predictive modelling, clustering, dimensionality reduction, feature importance, and data transformation.

    https://www.kdnuggets.com/2021/01/ultimate-scikit-learn-machine-learning-cheatsheet.html

  • 10 Underappreciated Python Packages for Machine Learning Practitioners">Gold Blog10 Underappreciated Python Packages for Machine Learning Practitioners

    Here are 10 underappreciated Python packages covering neural architecture design, calibration, UI creation and dissemination.

    https://www.kdnuggets.com/2021/01/10-underappreciated-python-packages-machine-learning-practitioners.html

  • How to Get a Job as a Data Engineer

    Data engineering skills are currently in high demand. If you are looking for career prospects in this fast-growing profession, then these 10 skills and key factors will help you prepare to land an entry-level position in this field.

    https://www.kdnuggets.com/2021/01/get-job-as-data-engineer.html

  • Model Experiments, Tracking and Registration using MLflow on Databricks

    This post covers how StreamSets can help expedite operations at some of the most crucial stages of Machine Learning Lifecycle and MLOps, and demonstrates integration with Databricks and MLflow.

    https://www.kdnuggets.com/2021/01/model-experiments-tracking-registration-mlflow-databricks.html

  • Six Tips on Building a Data Science Team at a Small Company

    When a company decides that they want to start leveraging their data for the first time, it can be a daunting task. Many businesses aren’t fully aware of all that goes into building a data science department. If you're the data scientist hired to make this happen, we have some tips to help you face the task head-on.

    https://www.kdnuggets.com/2021/01/six-tips-building-data-science-team-small-company.html

  • How to easily check if your Machine Learning model is fair?

    Machine learning models deployed today -- as will many more in the future -- impact people and society directly. With that power and influence resting in the hands of Data Scientists and machine learning engineers, taking the time to evaluate and understand if model results are fair will become the linchpin for the future success of AI/ML solutions. These are critical considerations, and using a recently developed fairness module in the dalex Python package is a unified and accessible way to ensure your models remain fair.

    https://www.kdnuggets.com/2020/12/machine-learning-model-fair.html

  • Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance

    A practical deep dive on production monitoring architectures for machine learning at scale using real-time metrics, outlier detectors, drift detectors, metrics servers and explainers.

    https://www.kdnuggets.com/2020/12/production-machine-learning-monitoring-outliers-drift-explainers-statistical-performance.html

  • Crack SQL Interviews">Gold BlogCrack SQL Interviews

    SQL is an essential programming language for data analysis and processing. So, SQL questions are always part of the interview process for data science-related jobs, including data analysts, data scientists, and data engineers. Become familiar with these common patterns seen in SQL interview questions and follow our tips on how to neatly handle each with SQL queries.

    https://www.kdnuggets.com/2020/12/crack-sql-interviews.html

  • Industry 2021 Predictions for AI, Analytics, Data Science, Machine Learning

    We bring you industry predictions from 12 innovative companies - what key trends they expect in 2021 in AI, Analytics, Data Science, and Machine Learning?

    https://www.kdnuggets.com/2020/12/industry-2021-predictions-ai-data-science-machine-learning.html

  • Data Compression via Dimensionality Reduction: 3 Main Methods

    Lift the curse of dimensionality by mastering the application of three important techniques that will help you reduce the dimensionality of your data, even if it is not linearly separable.

    https://www.kdnuggets.com/2020/12/data-compression-dimensionality-reduction.html

  • Pruning Machine Learning Models in TensorFlow

    Read this overview to learn how to make your models smaller via pruning.

    https://www.kdnuggets.com/2020/12/pruning-machine-learning-models-tensorflow.html

  • 14 Data Science projects to improve your skills

    There's a lot of data out there and so many data science techniques to master or review. Check out these great project ideas from easy to advanced difficulty levels to develop new skills and strengthen your portfolio.

    https://www.kdnuggets.com/2020/12/14-data-science-projects-improve-skills.html

Refine your search here:

No, thanks!