Search results for metadata

    Found 236 documents, 5933 searched:

  • Machine Learning Metadata Store

    In this article, we will learn about metadata stores, the need for them, their components, and metadata store management.

    https://www.kdnuggets.com/2022/08/machine-learning-metadata-store.html

  • Metadata Store for Production ML!

    Add Layer to your existing ML code and quickly get a rich model and data registry with experiment tracking!

    https://www.kdnuggets.com/2022/05/layer-metadata-store-production-ml.html

  • How Metadata Improves Security, Quality, and Transparency

    Metadata is the data providing context about the data, more than what you see in the rows and columns. By managing your metadata, you're effectively creating an encyclopedia of your data assets.

    https://www.kdnuggets.com/2022/04/metadata-improves-security-quality-transparency.html

  • 7 End-to-End MLOps Platforms You Must Try in 2024

    List of top MLOPs platforms that will help you with integration, training, tracking, deployment, monitoring, CI/CD, and optimizing the infrastructure.

    https://www.kdnuggets.com/7-end-to-end-mlops-platforms-you-must-try-in-2024

  • Retrieval Augmented Generation: Where Information Retrieval Meets Text Generation

    This article introduces retrieval augmented generation, which combines text generation with informaton retrieval in order to improve language model output.

    https://www.kdnuggets.com/retrieval-augmented-generation-where-information-retrieval-meets-text-generation

  • 5 Ways To Use LLMs On Your Laptop

    Run large language models on your local PC for customized AI capabilities with more control, privacy, and personalization.

    https://www.kdnuggets.com/5-ways-to-use-llms-on-your-laptop

  • What Is Data Lineage, And Why Does It Matter?

    If you’ve ever had conversations with data professionals, you’ve probably heard “data lineage” pop up quite a few times. So what is data lineage all about, and why is it important?

    https://www.kdnuggets.com/what-is-data-lineage-and-why-does-it-matter

  • A Data Lake, You Call It? It’s a Data Swamp

    How and why the data lake architecture often fails to meet its promises. And how better governance helps mitigate such challenges.

    https://www.kdnuggets.com/a-data-lake-you-call-it-it-a-data-swamp

  • OpenAI API for Beginners: Your Easy-to-Follow Starter Guide

    Learn how to use OpenAI Python API for accessing language, embedding, audio, vision, and image generation models.

    https://www.kdnuggets.com/openai-api-for-beginners-your-easy-to-follow-starter-guide

  • 6 Reasons Why a Universal Semantic Layer is Beneficial to Your Data Stack

    Looking to understand the universal semantic layer and how it can improve your data stack? This GigaOm Sonor report on Semantic Layers can help you delve deeper.

    https://www.kdnuggets.com/2024/01/cube-6-reasons-why-a-universal-semantic-layer-is-beneficial

  • 10 GitHub Repositories to Master Machine Learning

    The blog covers machine learning courses, bootcamps, books, tools, interview questions, cheat sheets, MLOps platforms, and more to master ML and secure your dream job.

    https://www.kdnuggets.com/10-github-repositories-to-master-machine-learning

  • How to Make Large Language Models Play Nice with Your Software Using LangChain

    Beyond simply chatting with an AI model and how LangChain elevates LLM interactions with humans.

    https://www.kdnuggets.com/how-to-make-large-language-models-play-nice-with-your-software-using-langchain

  • 5 Step Blueprint to Your Next Data Science Problem

    Ever sat there and thought about what steps you need to take to tackle your data science problem?

    https://www.kdnuggets.com/5-step-blueprint-to-your-next-data-science-problem

  • An Honest Comparison of Open Source Vector Databases

    We will explore their use cases, key features, performance metrics, supported programming languages, and more to provide a comprehensive and unbiased overview of each database.

    https://www.kdnuggets.com/an-honest-comparison-of-open-source-vector-databases

  • Building Data Pipelines to Create Apps with Large Language Models

    For production grade LLM apps, you need a robust data pipeline. This article talks about the different stages of building a Gen AI data pipeline and what is included in these stages.

    https://www.kdnuggets.com/building-data-pipelines-to-create-apps-with-large-language-models

  • 5 Free Books to Master SQL

    Use this knowledge to upskill yourselves.

    https://www.kdnuggets.com/5-free-books-to-master-sql

  • Data Warehouses vs. Data Lakes vs. Data Marts: Need Help Deciding?

    A comparative overview of data warehouses, data lakes, and data marts to help you make informed decisions on data storage solutions for your data architecture.

    https://www.kdnuggets.com/data-warehouses-vs-data-lakes-vs-data-marts-need-help-deciding

  • Exploring Data Mesh: A Paradigm Shift in Data Architecture

    Let’s explore Data Mesh, a modern approach to data architecture that decentralizes data ownership and management.

    https://www.kdnuggets.com/exploring-data-mesh-a-paradigm-shift-in-data-architecture

  • Best Practices for Building ETLs for ML

    This article talks about several best practices for writing ETLs for building training datasets. It delves into several software engineering techniques and patterns applied to ML.

    https://www.kdnuggets.com/best-practices-for-building-etls-for-ml

  • AI and Open Source Software: Separated at Birth?

    In this article, Luis shares with readers his thoughts on the intersection of open source software and machine learning and what the future might bring. Many articles cover how open source software is used by the machine learning community but this post focuses on the similarities between the two areas of practice and what machine learning can and can’t learn from open source software.

    https://www.kdnuggets.com/ai-and-open-source-software-separated-at-birth

  • Job Trends in Data Analytics: NLP for Job Trend Analysis

    Perform job trend analysis and check the results using NLP.

    https://www.kdnuggets.com/job-trends-in-data-analytics-nlp-for-job-trend-analysis

  • Data Management Principles for Data Science

    Back to Basics: Understanding key data management principles that data scientists should know.

    https://www.kdnuggets.com/data-management-principles-for-data-science

  • Build Your Own PandasAI with LlamaIndex

    Learn how to leverage LlamaIndex and GPT-3.5-Turbo to easily add natural language capabilities to Pandas for intuitive data analysis and conversation.

    https://www.kdnuggets.com/build-your-own-pandasai-with-llamaindex

  • Data Validation for PySpark Applications using Pandera

    New features and concepts.

    https://www.kdnuggets.com/2023/08/data-validation-pyspark-applications-pandera.html

  • Beyond Numpy and Pandas: Unlocking the Potential of Lesser-Known Python Libraries

    3 Python libraries for scientific computation you should know as a data professional.

    https://www.kdnuggets.com/2023/08/beyond-numpy-pandas-unlocking-potential-lesserknown-python-libraries.html

  • Python Vector Databases and Vector Indexes: Architecting LLM Apps

    Vector databases enable fast similarity search and scale across data points. For LLM apps, vector indexes can simplify architecture over full vector databases by attaching vectors to existing storage. Choosing indexes vs databases depends on specialized needs, existing infrastructure, and broader enterprise requirements.

    https://www.kdnuggets.com/2023/08/python-vector-databases-vector-indexes-architecting-llm-apps.html

  • A Comprehensive Guide to MLOps

    Machine Learning Operations (MLOps) is a relatively new discipline that provides the structure and support necessary for machine learning (ML) models to thrive in production environments.

    https://www.kdnuggets.com/2023/08/comprehensive-guide-mlops.html

  • Forget PIP, Conda, and requirements.txt! Use Poetry Instead And Thank Me Later

    Pain-free dependency management is finally here.

    https://www.kdnuggets.com/2023/07/forget-pip-conda-requirementstxt-poetry-instead-thank-later.html

  • A Beginner’s Guide to Data Engineering

    So you want to break into data engineering? Start today by learning more about data engineering and the fundamental concepts.

    https://www.kdnuggets.com/2023/07/beginner-guide-data-engineering.html

  • Evolution of the Data Landscape

    The article follows the story of evolution in the data space through the lens of evolutionary patterns. It talks of the state of significant milestones in the evolutionary journey, their achievements, challenges, and the next milestone that solved those challenges. The article comes from both a business and technical perspective, owing to the persona of the authors.

    https://www.kdnuggets.com/2023/06/evolution-data-landscape.html

  • The Importance of Reproducibility in Machine Learning

    And how approaches to better data management, version control, and experiment tracking can help build reproducible ML pipelines.

    https://www.kdnuggets.com/2023/06/importance-reproducibility-machine-learning.html

  • Big Data Analytics: Why Is It So Crucial For Business Intelligence?

    Understand the relationship between big data and business intelligence.

    https://www.kdnuggets.com/2023/06/big-data-analytics-crucial-business-intelligence.html

  • GPT4All is the Local ChatGPT for your Documents and it is Free!

    How to install GPT4All on your Laptop and ask AI about your own domain knowledge (your documents)… and it runs on CPU only!.

    https://www.kdnuggets.com/2023/06/gpt4all-local-chatgpt-documents-free.html

  • A Playbook to Scale MLOps

    MLOps teams are pressured to advance their capabilities to scale AI. We teamed up with Ford Motors to explore how to scale MLOps within an organization and how to get started.

    https://www.kdnuggets.com/2023/06/playbook-scale-mlops.html

  • DINOv2: Self-Supervised Computer Vision Models by Meta AI

    Unleashing the Potential of Computer Vision with DINOv2: A Groundbreaking Self-Supervised Model by Meta AI.

    https://www.kdnuggets.com/2023/05/dinov2-selfsupervised-computer-vision-models-meta-ai.html

  • StarCoder: The Coding Assistant That You Always Wanted

    Let advanced AI take care of code completion, formatting, translation, and bug fixing. You can also chat with a StarChat and use VSCode extensions for work.

    https://www.kdnuggets.com/2023/05/starcoder-coding-assistant-always-wanted.html

  • A Step-by-Step Guide to Web Scraping with Python and Beautiful Soup

    Learn the basics of Web Scraping and its Python implementation. Also, get to know about the various methods of Beautiful Soup library.

    https://www.kdnuggets.com/2023/04/stepbystep-guide-web-scraping-python-beautiful-soup.html

  • How to Build a Scalable Data Architecture with Apache Kafka

    Learn about Apache Kafka architecture and its implementation using a real-world use case of a taxi booking app.

    https://www.kdnuggets.com/2023/04/build-scalable-data-architecture-apache-kafka.html

  • Multimodal Models Explained

    Unlocking the Power of Multimodal Learning: Techniques, Challenges, and Applications.

    https://www.kdnuggets.com/2023/03/multimodal-models-explained.html

  • 5 More Command Line Tools for Data Science

    Use these tools to Access API, Manipulate CSV files, download datasets, and more from your terminal.

    https://www.kdnuggets.com/2023/03/5-command-line-tools-data-science.html

  • New ChatGPT and Whisper APIs from OpenAI

    A quick overview of ChatGPT and Whisper models API.

    https://www.kdnuggets.com/2023/03/new-chatgpt-whisper-apis-openai.html

  • ChatGPT vs Google Bard: A Comparison of the Technical Differences

    KDnuggets Top Blog The Biggest Rivalry: ChatGPT vs Google Bard! Here's a comparison of the technical differences between the two AI engines.

    https://www.kdnuggets.com/2023/03/chatgpt-google-bard-comparison-technical-differences.html

  • Building a Recommender System for Amazon Products with Python

    I built a recommender system for Amazon’s electronics category.

    https://www.kdnuggets.com/2023/02/building-recommender-system-amazon-products-python.html

  • skops: A New Library to Improve Scikit-learn in Production

    There are various challenges in MLOps and model sharing, including, security and reproducibility. To tackle these for scikit-learn models, we've developed a new open-source library: skops. In this article, I will walk you through how it works and how to use it with an end-to-end example.

    https://www.kdnuggets.com/2023/02/skops-new-library-improve-scikitlearn-production.html

  • Tapping into the Potential of Data Products in 2023

    Learn how data can be treated as a product and how it can be used to derive value.

    https://www.kdnuggets.com/2023/01/tapping-potential-data-products-2023.html

  • Key Data Science, Machine Learning, AI and Analytics Developments of 2022

    It's the end of the year, and so it's time for KDnuggets to assemble a team of experts and get to the bottom of what the most important data science, machine learning, AI and analytics developments of 2022 were.

    https://www.kdnuggets.com/2022/12/key-data-science-machine-learning-ai-analytics-developments-2022.html

  • The Complete MLOps Study Roadmap

    Kickstart your career as an MLOps Engineer with this study roadmap.

    https://www.kdnuggets.com/2022/12/complete-mlops-study-roadmap.html

  • What are Moment-Generating Functions?

    A brief overview of what moment-generating functions are and how they are used in probability and statistics.

    https://www.kdnuggets.com/2022/12/momentgenerating-functions.html

  • The ABCs of NLP, From A to Z

    There is no shortage of tools today that can help you through the steps of natural language processing, but if you want to get a handle on the basics this is a good place to start. Read about the ABCs of NLP, all the way from A to Z.

    https://www.kdnuggets.com/2022/10/abcs-nlp-a-to-z.html

  • Top 10 MLOps Tools to Optimize & Manage Machine Learning Lifecycle

    As more businesses experiment with data, they realize that developing a machine learning (ML) model is only one of many steps in the ML lifecycle.

    https://www.kdnuggets.com/2022/10/top-10-mlops-tools-optimize-manage-machine-learning-lifecycle.html

  • 11 Questions About Data Engineers: What’s the profession about, and where’s it heading?

    I hope my answers will be useful to novice data engineers and anyone interested in data engineering.

    https://www.kdnuggets.com/2022/10/11-questions-data-engineers-profession-heading.html

  • The Machine Learning Lifecycle

    Learn about the standard process for building sustainable machine learning applications.

    https://www.kdnuggets.com/2022/06/making-sense-crispmlq-machine-learning-lifecycle-process.html

  • KDnuggets News, September 14: Free Python for Data Science Course • Everything You’ve Ever Wanted to Know About Machine Learning

    Free Python for Data Science Course • Everything You’ve Ever Wanted to Know About Machine Learning • Progress Bars in Python with tqdm for Fun and Profit • 7 Tips for Python Beginners • 7 Data Analytics Interview Questions & Answers

    https://www.kdnuggets.com/2022/n36.html

  • Everything You Need to Know About Data Lakehouses

    Learn everything you need to know about data lakehouses.

    https://www.kdnuggets.com/2022/09/everything-need-know-data-lakehouses.html

  • 7 Things You Didn’t Know You Could do with a Low Code Tool

    Surprisingly easy solutions for complex data problems.

    https://www.kdnuggets.com/2022/09/7-things-didnt-know-could-low-code-tool.html

  • Data Governance and Observability, Explained

    Let’s dive in and understand the ins and outs of data observability and data governance - the two keys to a more robust data foundation.

    https://www.kdnuggets.com/2022/08/data-governance-observability-explained.html

  • How to Package and Distribute Machine Learning Models with MLFlow

    MLFlow is a tool to manage the end-to-end lifecycle of a Machine Learning model. Likewise, the installation and configuration of an MLFlow service is addressed and examples are added on how to generate and share projects with MLFlow.

    https://www.kdnuggets.com/2022/08/package-distribute-machine-learning-models-mlflow.html

  • The Data Quality Hierarchy of Needs

    Just as Maslow identified a hierarchy of needs for people, data teams have a hierarchy of needs, beginning with data freshness; including volumes, schemas, and values; and culminating with lineage.

    https://www.kdnuggets.com/2022/08/data-quality-hierarchy-needs.html

  • 5 Project Ideas to Stay Up-To-Date as a Data Scientist

    The skills you have need maintenance and occasional updates. Doing an interesting data science project is what will keep you from getting rusty.

    https://www.kdnuggets.com/2022/07/5-project-ideas-stay-uptodate-data-scientist.html

  • MLOps: The Key To Pushing AI Into The Mainstream

    In this blog, we will aim at discussing the reasons that make MLOps an essential aspect of pushing AI mainstream. Besides, we will highlight the capabilities of MLOps as a catalyst for AI implementation.

    https://www.kdnuggets.com/2022/07/mlops-key-pushing-ai-mainstream.html

  • 16 Essential DVC Commands for Data Science

    KDnuggets Top Blog Learn essential DVC commands to version large datasets and track and manage the machine learning experiments.

    https://www.kdnuggets.com/2022/07/16-essential-dvc-commands-data-science.html

  • Machine Learning Model Management

    The tools used in the development cycle for Machine Learning and the managing of the models require MLOps - Machine Learning Operations.

    https://www.kdnuggets.com/2022/07/machine-learning-model-management.html

  • 12 Essential VSCode Extensions for Data Science

    KDnuggets Top Blog Learn about the data science VSCode extensions for super productivity and better user experience.

    https://www.kdnuggets.com/2022/07/12-essential-vscode-extensions-data-science.html

  • Top 5 Data Management Platforms

    This article presents the top 5 data management platforms, in order to help you choose which might be best for you.

    https://www.kdnuggets.com/2022/06/top-5-data-management-platforms.html

  • Top 15 Books to Master Data Strategy

    In this article, we outline 15 books on topics ranging from the technical to the non-technical, to help you improve your understanding of end-to-end best practices related to data.

    https://www.kdnuggets.com/2022/06/top-15-books-master-data-strategy.html

  • KDnuggets News, June 1: The Complete Collection of Data Science Books; Projects That Will Land You The Job in 2022

    The Complete Collection of Data Science Books - Part 2; Data Science Projects That Will Land You The Job in 2022; How to Become a Machine Learning Engineer; Dynamic Time Warping Algorithm in Time Series, Explained; Free Data Engineering Courses

    https://www.kdnuggets.com/2022/n22.html

  • 6 Things You Need To Know About Data Management And Why It Matters For Computer Vision

    This article will explore a few areas that we feel are essential when assessing data management solutions for computer vision.

    https://www.kdnuggets.com/2022/05/6-things-need-know-data-management-matters-computer-vision.html

  • Database Key Terms, Explained

    Interested in a survey of important database concepts and terminology? This post concisely defines 16 essential database key terms.

    https://www.kdnuggets.com/2016/07/database-key-terms-explained.html

  • Should The Data Warehouse Be Immutable?

    Is the data warehouse broken? Is the "immutable data warehouse" the right path for your data team? Learn more here.

    https://www.kdnuggets.com/2022/05/data-warehouse-immutable.html

  • Natural Language Processing Key Terms, Explained

    This post provides a concise overview of 18 natural language processing terms, intended as an entry point for the beginner looking for some orientation on the topic.

    https://www.kdnuggets.com/2017/02/natural-language-processing-key-terms-explained.html

  • Create Efficient Combined Data Sources with Tableau

    Save time and effort with this guide, which will show you how to do data join operations in Tableau.

    https://www.kdnuggets.com/2022/05/create-efficient-combined-data-sources-tableau.html

  • 5 Key Components of a Data Sharing Platform

    Read this article for an overview of what the components of a data-sharing platform are.

    https://www.kdnuggets.com/2022/05/5-key-components-data-sharing-platform.html

  • Data Management: How to Stay on Top of Your Customer’s Mind?

    Extract, profile, and manage your customer data in a flash with customer data management solutions, and achieve a customer-centric culture.

    https://www.kdnuggets.com/2022/04/data-management-stay-top-customer-mind.html

  • MLOps: The Best Practices and How To Apply Them

    Here are some of the best practices for implementing MLOps successfully.

    https://www.kdnuggets.com/2022/04/mlops-best-practices-apply.html

  • KDnuggets News, April 27: A Brief Introduction to Papers With Code; Machine Learning Books You Need To Read In 2022

    A Brief Introduction to Papers With Code; Machine Learning Books You Need To Read In 2022; Building a Scalable ETL with SQL + Python; 7 Steps to Mastering SQL for Data Science; Top Data Science Projects to Build Your Skills

    https://www.kdnuggets.com/2022/n17.html

  • The Complete Collection Of Data Repositories – Part 2

    Check out the collection of the best data repositories on healthcare, natural language, neuroscience, physics, social network, sports, time series, transportation, miscellaneous, and super data repositories.

    https://www.kdnuggets.com/2022/04/complete-collection-data-repositories-part-2.html

  • Cloud Storage Adoption is the Need of the Hour for Business

    The rush towards cloud storage means that the cloud has to offer a valuable proposition to businesses. Let’s explore why businesses regardless of their size should consider moving to the cloud.

    https://www.kdnuggets.com/2022/02/cloud-storage-adoption-need-hour-business.html

  • Top 5 Free Machine Learning Courses

    Give a boost to your career and learn job-ready machine learning skills by taking the best free online courses.

    https://www.kdnuggets.com/2022/02/top-5-free-machine-learning-courses.html

  • 19 Data Science Project Ideas for Beginners

    This article features 19 data science projects for beginners, categorized into 7 full project tutorials, 5 places to come up with your own data science projects using data, and 7 skills-based data science projects.

    https://www.kdnuggets.com/2021/11/19-data-science-project-ideas-beginners.html

  • Classifying Long Text Documents Using BERT

    Transformer based language models such as BERT are really good at understanding the semantic context because they were designed specifically for that purpose. BERT outperforms all NLP baselines, but as we say in the scientific community, “no free lunch”. How can we use BERT to classify long text documents?

    https://www.kdnuggets.com/2022/02/classifying-long-text-documents-bert.html

  • How to Write SQL in Native Python

    If the idea of being able to link with SQL databases and define, manipulate, and query using Python sounds appealing, check out the SQLModel library.

    https://www.kdnuggets.com/2022/02/easy-sql-native-python.html

  • Feature Selection: Where Science Meets Art

    From heuristic to algorithmic feature selection techniques for data science projects.

    https://www.kdnuggets.com/2021/12/feature-selection-science-meets-art.html

  • KDnuggets™ News 21:n45, Dec 1: Most Common SQL Mistakes on Data Science Interviews; Why Machine Learning Engineers are Replacing Data Scientists

    Most Common SQL Mistakes on Data Science Interviews; Why Machine Learning Engineers are Replacing Data Scientists; Vote in new KDnuggets Poll: What Percentage of Your Machine Learning Models Have Been Deployed? KDnuggets: Personal History and Nuggets of Experience.

    https://www.kdnuggets.com/2021/n45.html

  • On-Device Deep Learning: PyTorch Mobile and TensorFlow Lite

    PyTorch and TensorFlow are the two leading AI/ML Frameworks. In this article, we take a look at their on-device counterparts PyTorch Mobile and TensorFlow Lite and examine them more deeply from the perspective of someone who wishes to develop and deploy models for use on mobile platforms.

    https://www.kdnuggets.com/2021/11/on-device-deep-learning-pytorch-mobile-tensorflow-lite.html

  • How I Redesigned over 100 ETL into ELT Data Pipelines">Silver BlogHow I Redesigned over 100 ETL into ELT Data Pipelines

    Learn how to level up your Data Pipelines!

    https://www.kdnuggets.com/2021/11/redesigned-over-100-etl-elt-data-pipelines.html

  • What Comes After HDF5? Seeking a Data Storage Format for Deep Learning

    In this article we are discussing that HDF5 is one of the most popular and reliable formats for non-tabular, numerical data. But this format is not optimized for deep learning work. This article suggests what kind of ML native data format should be to truly serve the needs of modern data scientists.

    https://www.kdnuggets.com/2021/11/after-hdf5-data-storage-format-deep-learning.html

  • Simple Text Scraping, Parsing, and Processing with this Python Library

    Scraping, parsing, and processing text data from the web can be difficult. But it can also be easy, using Newspaper3k.

    https://www.kdnuggets.com/2021/10/simple-text-scraping-parsing-processing-python-library.html

  • Machine Learning Model Development and Model Operations: Principles and Practices">Gold BlogMachine Learning Model Development and Model Operations: Principles and Practices

    The ML model management and the delivery of highly performing model is as important as the initial build of the model by choosing right dataset. The concepts around model retraining, model versioning, model deployment and model monitoring are the basis for machine learning operations (MLOps) that helps the data science teams deliver highly performing models.

    https://www.kdnuggets.com/2021/10/machine-learning-model-development-operations-principles-practice.html

  • Knowledge Graph Forum: Technology Ecosystem and Business Applications

    Ontotext is thrilled to invite you to the Ontotext & partners virtual Knowledge Graph Forum, Oct 26 & 27, 2021. This event is shaped by Ontotext’s vision that knowledge graphs serve as a hub for data, metadata and content. 35+ speakers from around the globe will share their experiences through real-life cases and platforms demonstrations. Save your spot now.

    https://www.kdnuggets.com/2021/10/ontotext-knowledge-graph-forum.html

  • Use These Unique Data Sets to Sharpen Your Data Science Skills

    Want to get your hands on some real-world data sets right now? Kick off your bootcamp prep with this list of hot-button data sets curated to help you hone different data science skills.

    https://www.kdnuggets.com/2021/09/springboard-unique-data-sets-data-science-skills.html

  • Building a Structured Financial Newsfeed Using Python, SpaCy and Streamlit

    Getting started with NLP by building a Named Entity Recognition(NER) application.

    https://www.kdnuggets.com/2021/09/-structured-financial-newsfeed-using-python-spacy-and-streamlit.html

  • Data Engineering Technologies 2021

    Emerging technologies supporting the field of data engineering are growing at a rapid clip. This curated list includes the most important offerings available in 2021.

    https://www.kdnuggets.com/2021/09/data-engineering-technologies-2021.html

  • MLOps And Machine Learning Roadmap

    A 16–20 week roadmap to review machine learning and learn MLOps.

    https://www.kdnuggets.com/2021/08/mlops-machine-learning-roadmap.html

  • MLOps Best Practices

    Many technical challenges must be overcome to achieve successful delivery of machine learning solutions at scale. This article shares best practices we encountered while architecting and applying a model deployment platform within a large organization, including required functionality, the recommendation for a scalable deployment pattern, and techniques for testing and performance tuning models to maximize platform throughput.

    https://www.kdnuggets.com/2021/07/mlops-best-practices.html

  • How to Use Kafka Connect to Create an Open Source Data Pipeline for Processing Real-Time Data

    This article shows you how to create a real-time data pipeline using only pure open source technologies. These include Kafka Connect, Apache Kafka, Kibana and more.

    https://www.kdnuggets.com/2021/07/kafka-open-source-data-pipeline-processing-real-time-data.html

  • High-Performance Deep Learning: How to train smaller, faster, and better models – Part 5

    Training efficient deep learning models with any software tool is nothing without an infrastructure of robust and performant compute power. Here, current software and hardware ecosystems are reviewed that you might consider in your development when the highest performance possible is needed.

    https://www.kdnuggets.com/2021/07/high-performance-deep-learning-part5.html

  • Unleashing the Power of MLOps and DataOps in Data Science

    Organizations trying to move forward with analytics and data science initiatives -- while floating in an ocean of data -- must enhance their overall approach and culture to embrace a foundation on DataOps and MLOps. Leveraging these operational frameworks are necessary to enable the data to generate real business value.

    https://www.kdnuggets.com/2021/06/power-mlops-dataops-data-science.html

  • PyCaret 101: An introduction for beginners

    This article is a great overview of how to get started with PyCaret for all your machine learning projects.

    https://www.kdnuggets.com/2021/06/pycaret-101-introduction-beginners.html

  • DataOps: 5 things that you need to know

    DataOps (Data Operations) has assumed a critical role in the age of big data to drive definitive impact on business outcomes. This process-oriented and agile methodology synergizes the components of DevOps and the capabilities of data engineers and data scientists to support data-focused workloads in enterprises. Here is a detailed look at DataOps.

    https://www.kdnuggets.com/2021/05/dataops-5-things-need-know.html

  • Awesome list of datasets in 100+ categories

    With an estimated 44 zettabytes of data in existence in our digital world today and approximately 2.5 quintillion bytes of new data generated daily, there is a lot of data out there you could tap into for your data science projects. It's pretty hard to curate through such a massive universe of data, but this collection is a great start. Here, you can find data from cancer genomes to UFO reports, as well as years of air quality data to 200,000 jokes. Dive into this ocean of data to explore as you learn how to apply data science techniques or leverage your expertise to discover something new.

    https://www.kdnuggets.com/2021/05/awesome-list-datasets.html

  • A checklist to track your Data Science progress">Silver BlogA checklist to track your Data Science progress

    Whether you are just starting out in data science or already a gainfully-employed professional, always learning more to advance through state-of-the-art techniques is part of the adventure. But, it can be challenging to track of your progress and keep an eye on what's next. Follow this checklist to help you scale your expertise from entry-level to advanced.

    https://www.kdnuggets.com/2021/05/checklist-data-science-progress.html

  • Easy MLOps with PyCaret + MLflow

    A beginner-friendly, step-by-step tutorial on integrating MLOps in your Machine Learning experiments using PyCaret.

    https://www.kdnuggets.com/2021/05/easy-mlops-pycaret-mlflow.html

  • Feature stores – how to avoid feeling that every day is Groundhog Day

    Feature stores stop the duplication of each task in the ML lifecycle. You can reuse features and pipelines for different models, monitor models consistently, and sidestep data leakage with this MLOps technology that everyone is talking about.

    https://www.kdnuggets.com/2021/05/feature-stores-how-avoid-feeling-every-day-is-groundhog-day.html

  • Top 3 Statistical Paradoxes in Data Science">Silver BlogTop 3 Statistical Paradoxes in Data Science

    Observation bias and sub-group differences generate statistical paradoxes.

    https://www.kdnuggets.com/2021/04/top-3-statistical-paradoxes-data-science.html

  • A Simple Way to Time Code in Python

    Read on to find out how to use a decorator to time your functions.

    https://www.kdnuggets.com/2021/03/simple-way-time-code-python.html

  • Data Observability, Part II: How to Build Your Own Data Quality Monitors Using SQL

    Using schema and lineage to understand the root cause of your data anomalies.

    https://www.kdnuggets.com/2021/02/data-observability-part-2-build-data-quality-monitors-sql.html

  • Inside the Architecture Powering Data Quality Management at Uber

    Data Quality Monitor implements novel statistical methods for anomaly detection and quality management in large data infrastructures.

    https://www.kdnuggets.com/2021/02/inside-architecture-powering-data-quality-management-uber.html

  • Feature Store as a Foundation for Machine Learning

    With so many organizations now taking the leap into building production-level machine learning models, many lessons learned are coming to light about the supporting infrastructure. For a variety of important types of use cases, maintaining a centralized feature store is essential for higher ROI and faster delivery to market. In this review, the current feature store landscape is described, and you can learn how to architect one into your MLOps pipeline.

    https://www.kdnuggets.com/2021/02/feature-store-foundation-machine-learning.html

  • Data Observability: Building Data Quality Monitors Using SQL

    To trigger an alert when data breaks, data teams can leverage a tried and true tactic from our friends in software engineering: monitoring and observability. In this article, we walk through how you can create your own data quality monitors for freshness and distribution from scratch using SQL.

    https://www.kdnuggets.com/2021/02/data-observability-building-data-quality-monitors-using-sql.html

  • Column-Oriented Databases, Explained

    NoSQL Databases have four distinct types. Key-value stores, document-stores, graph databases, and column-oriented databases. In this article, we’ll explore column-oriented databases, also known simply as “NoSQL columns”.

    https://www.kdnuggets.com/2021/02/understanding-nosql-database-types-column-oriented-databases.html

  • How to Deploy a Flask API in Kubernetes and Connect it with Other Micro-services

    A hands-on tutorial on how to implement your micro-service architecture using the powerful container orchestration tool Kubernetes.

    https://www.kdnuggets.com/2021/02/deploy-flask-api-kubernetes-connect-micro-services.html

  • Getting Started with 5 Essential Natural Language Processing Libraries">Silver BlogGetting Started with 5 Essential Natural Language Processing Libraries

    This article is an overview of how to get started with 5 popular Python NLP libraries, from those for linguistic data visualization, to data preprocessing, to multi-task functionality, to state of the art language modeling, and beyond.

    https://www.kdnuggets.com/2021/02/getting-started-5-essential-nlp-libraries.html

  • Saving and loading models in TensorFlow — why it is important and how to do it

    So much time and effort can go into training your machine learning models. But, shut down the notebook or system, and all those trained weights and more vanish with the memory flush. Saving your models to maximize reusability is key for efficient productivity.

    https://www.kdnuggets.com/2021/02/saving-loading-models-tensorflow.html

  • Building a Deep Learning Based Reverse Image Search">Silver BlogBuilding a Deep Learning Based Reverse Image Search

    Following the journey from unstructured data to content based image retrieval.

    https://www.kdnuggets.com/2021/01/deep-learning-based-reverse-image-search.html

  • 5 Tools for Effortless Data Science

    The sixth tool is coffee.

    https://www.kdnuggets.com/2021/01/5-tools-effortless-data-science.html

  • Meet whale! The stupidly simple data discovery tool">Gold BlogMeet whale! The stupidly simple data discovery tool

    Finding data and understanding its meaning represents the traditional "daily grind" of a Data Scientist. With whale, the new lightweight data discovery, documentation, and quality engine for your data warehouse that is under development by Dataframe, your data science team will more efficiently search data and automate its data metrics.

    https://www.kdnuggets.com/2020/12/whale-data-discovery-tool.html

  • Data Catalogs Are Dead; Long Live Data Discovery

    Why data catalogs aren’t meeting the needs of the modern data stack, and how a new approach – data discovery – is needed to better facilitate metadata management and data reliability.

    https://www.kdnuggets.com/2020/12/data-catalogs-dead-long-live-data-discovery.html

  • Feature Store vs Data Warehouse

    A feature store is a data warehouse of features for machine learning. Differently from a data warehouse, it is dual-database: one serving features at low latency to online applications and another storing large volumes of features. Learn how Data Scientists leverage this capability in production-deployed models.

    https://www.kdnuggets.com/2020/12/feature-store-vs-data-warehouse.html

  • Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance

    A practical deep dive on production monitoring architectures for machine learning at scale using real-time metrics, outlier detectors, drift detectors, metrics servers and explainers.

    https://www.kdnuggets.com/2020/12/production-machine-learning-monitoring-outliers-drift-explainers-statistical-performance.html

  • MLOps Is Changing How Machine Learning Models Are Developed

    Delivering machine learning solutions is so much more than the model. Three key concepts covering version control, testing, and pipelines are the foundation for machine learning operations (MLOps) that help data science teams ship models quicker and with more confidence.

    https://www.kdnuggets.com/2020/12/mlops-changing-machine-learning-developed.html

  • Covid or just a Cough? AI for detecting COVID-19 from Cough Sounds

    Increased capabilities in screening and early testing for a disease can significantly support quelling its spread and impact. Recent progress in developing deep learning AI models to classify cough sounds as a prescreening tool for COVID-19 has demonstrated promising early success. Cough-based diagnosis is non-invasive, cost-effective, scalable, and, if approved, could be a potential game-changer in our fight against COVID-19.

    https://www.kdnuggets.com/2020/12/covid-cough-ai-detecting-sounds.html

  • AI registers: finally, a tool to increase transparency in AI/ML

    Transparency, explainability, and trust are pressing topics in AI/ML today. While much has been written about why they are important and what you need to do, no tools have existed until now.

    https://www.kdnuggets.com/2020/12/ai-registers-transparency-ml.html

  • How to Incorporate Tabular Data with HuggingFace Transformers

    In real-world scenarios, we often encounter data that includes text and tabular features. Leveraging the latest advances for transformers, effectively handling situations with both data structures can increase performance in your models.

    https://www.kdnuggets.com/2020/11/tabular-data-huggingface-transformers.html

  • Platinum BlogThe Best Data Science Certification You’ve Never Heard Of">Silver BlogPlatinum BlogThe Best Data Science Certification You’ve Never Heard Of

    The CDMP is the best data strategy certification you’ve never heard of. (And honestly, when you consider the fact that you’re probably working a job that didn’t exist ten years ago, it’s not surprising that this certification isn’t widespread just yet.)

    https://www.kdnuggets.com/2020/11/best-data-science-certification-never-heard.html

  • 10 Underrated Python Skills

    Tips for feature analysis, hyperparameter tuning, data visualization and more.

    https://www.kdnuggets.com/2020/10/10-underrated-python-skills.html

  • Text Mining with R: The Free eBook">Silver BlogText Mining with R: The Free eBook

    This freely-available book will show you how to perform text analytics in R, using packages from the tidyverse.

    https://www.kdnuggets.com/2020/10/text-mining-r-free-ebook.html

  • LinkedIn’s Pro-ML Architecture Summarizes Best Practices for Building Machine Learning at Scale

    The reference architecture is powering mission critical machine learning workflows within LinkedIn.

    https://www.kdnuggets.com/2020/09/linkedin-pro-ml-architecture-best-practices-building-machine-learning-scale.html

  • Data Science Meets Devops: MLOps with Jupyter, Git, and Kubernetes

    An end-to-end example of deploying a machine learning product using Jupyter, Papermill, Tekton, GitOps and Kubeflow.

    https://www.kdnuggets.com/2020/08/data-science-meets-devops-mlops-jupyter-git-kubernetes.html

  • GitHub is the Best AutoML You Will Ever Need

    This article uses PyCaret 2.0, an open source, low-code machine learning library in Python to develop a simple AutoML solution and deploy it as a Docker container using GitHub actions.

    https://www.kdnuggets.com/2020/08/github-best-automl-ever-need.html

  • Containerization of PySpark Using Kubernetes

    This article demonstrates the approach of how to use Spark on Kubernetes. It also includes a brief comparison between various cluster managers available for Spark.

    https://www.kdnuggets.com/2020/08/containerization-pyspark-kubernetes.html

Refine your search here:

No, thanks!