Search results for key value store

    Found 411 documents, 5946 searched:

  • Data Scientist, Data Engineer & Other Data Careers, Explained

    In this article, we will have a look at five distinct data careers, and hopefully provide some advice on how to get one's feet wet in this convoluted field.

    https://www.kdnuggets.com/2021/05/data-scientist-data-engineer-data-careers-explained.html

  • How to Ace Data Science Assessment Test by Using Automatic EDA Tools

    By using a few lines of code, you can understand key aspects of a given dataset. These tools have helped me answer business-related questions during the data assessment test by Alooba.

    https://www.kdnuggets.com/2022/04/ace-data-science-assessment-test-automatic-eda-tools.html

  • Top 13 Skills That Every Data Scientist Should Have

    KDnuggets Top Blog Let me walk you through the top 13 data science skills that you should have to become a successful data scientist. Following this outline, you’ll have a great path of digestible steps to educate yourself and be prepared to apply for data scientist positions.

    https://www.kdnuggets.com/2022/03/top-13-skills-every-data-scientist.html

  • MLOps Is a Mess But That’s to be Expected

    In this post, I want to focus the discussion about the state of machine learning operations (MLOps) today, where we are, where we are going.

    https://www.kdnuggets.com/2022/03/mlops-mess-expected.html

  • The Most Popular Intro to Programming Course From Harvard is Free!

    KDnuggets Top Blog CS50's Introduction to Computer Science has the highest enrollment on Harvard's campus... and is free to anyone interested in taking it!

    https://www.kdnuggets.com/2022/03/popular-intro-programming-course-harvard-free.html

  • Building a Geospatial Application in Python with Google Earth Engine and Greppo

    In this blog, you will see how to build a web-application with Greppo and Google Earth using Python.

    https://www.kdnuggets.com/2022/03/building-geospatial-application-python-google-earth-engine-greppo.html

  • Building a Tractable, Feature Engineering Pipeline for Multivariate Time Series

    A time series feature engineering pipeline requires different transformations such as imputation and window aggregation, which follows a sequence of stages. This article demonstrates the building of a pipeline to derive multivariate time series features such that the features can then be easily tracked and validated.

    https://www.kdnuggets.com/2022/03/building-tractable-feature-engineering-pipeline-multivariate-time-series.html

  • Data: The Most Valuable Commodity for Businesses

    Many companies have been capturing customer data in some form or another for decades. Petabytes of data are traversing networks worldwide every day, and all of that data means big money. Here's how companies can best utilize this data to influence positive outcomes.

    https://www.kdnuggets.com/2022/03/data-valuable-commodity-businesses.html

  • How to Become a Successful Data Science Freelancer in 2022

    In this article, I will walk you through how you can use your data science skills to land freelance gigs.

    https://www.kdnuggets.com/2022/02/become-successful-data-science-freelancer-2022.html

  • The Not-so-Sexy SQL Concepts to Make You Stand Out

    KDnuggets Top Blog Databases are the houses of our data and data scientists HAVE TO HAVE A KEY! In this article, I discuss some lesser known concepts of SQL that data scientists do not familiarize themselves with.

    https://www.kdnuggets.com/2022/02/not-so-sexy-sql-concepts-stand-out.html

  • 5 Ways To Use AI For Supply Chain Management

    Using AI to help optimize supply chain management is becoming more prevalent across industries. Early adopters are more resilient and prepared for the inevitable future of artificial intelligence within the supply chain management industry.

    https://www.kdnuggets.com/2022/02/5-ways-ai-supply-chain-management.html

  • Celebrating Awareness of the Importance of Data Privacy

    January 28 is Data Privacy Day, bringing awareness of the basic foundation and principles of data protection. Read about the day itself, why data privacy is important, and best practices you can adhere to in order to help ensure the privacy of your data.

    https://www.kdnuggets.com/2022/01/celebrating-awareness-importance-data-privacy.html

  • How to Answer Data Science Coding Interview Questions

    Use this checklist to make sure your answer to the data science coding interview questions is on the right track.

    https://www.kdnuggets.com/2022/01/answer-data-science-coding-interview-questions.html

  • Automate Microsoft Excel and Word Using Python

    Integrate Excel with Word to generate automated reports seamlessly.

    https://www.kdnuggets.com/2021/08/automate-microsoft-excel-word-python.html

  • Analyzing Scientific Articles with fine-tuned SciBERT NER Model and Neo4j

    In this article, we will be analyzing a dataset of scientific abstracts using the Neo4j Graph database and a fine-tuned SciBERT model.

    https://www.kdnuggets.com/2021/12/analyzing-scientific-articles-finetuned-scibert-ner-model-neo4j.html

  • Build a Serverless News Data Pipeline using ML on AWS Cloud

    This is the guide on how to build a serverless data pipeline on AWS with a Machine Learning model deployed as a Sagemaker endpoint.

    https://www.kdnuggets.com/2021/11/build-serverless-news-data-pipeline-ml-aws-cloud.html

  • Inside recommendations: how a recommender system recommends

    We describe types of recommender systems, more specifically, algorithms and methods for content-based systems, collaborative filtering, and hybrid systems.

    https://www.kdnuggets.com/2021/11/recommendations-recommender-system.html

  • Machine Learning Model Development and Model Operations: Principles and Practices">Gold BlogMachine Learning Model Development and Model Operations: Principles and Practices

    The ML model management and the delivery of highly performing model is as important as the initial build of the model by choosing right dataset. The concepts around model retraining, model versioning, model deployment and model monitoring are the basis for machine learning operations (MLOps) that helps the data science teams deliver highly performing models.

    https://www.kdnuggets.com/2021/10/machine-learning-model-development-operations-principles-practice.html

  • Training BPE, WordPiece, and Unigram Tokenizers from Scratch using Hugging Face

    Comparing the tokens generated by SOTA tokenization algorithms using Hugging Face's tokenizers package.

    https://www.kdnuggets.com/2021/10/bpe-wordpiece-unigram-tokenizers-using-hugging-face.html

  • How I Built A Perfect Model And Got Into Trouble

    Data-driven decisions, actionable insights, business impact—you've seen these buzzwords in data science jobs descriptions. But, just focusing on these terms doesn't automatically lead to the best results. Learn from this real-world scenario that followed data-driven indecisiveness, found misleading insights, and initially created a negative business impact.

    https://www.kdnuggets.com/2021/10/perfect-model-trouble.html

  • Data science SQL interview questions from top tech firms">Gold BlogData science SQL interview questions from top tech firms

    As a data scientist, there is one thing you really need to understand and know how to handle: data. With SQL being a foundational technical approach for working with data, it should not be surprising that the top tech companies will ask about your SQL skills during an interview. Here, we cover the key concepts tested so you can best prepare for your next data science interview.

    https://www.kdnuggets.com/2021/10/data-science-sql-interview-questions.html

  • 20 Machine Learning Projects That Will Get You Hired">Silver Blog20 Machine Learning Projects That Will Get You Hired

    If you want to break into the machine learning and data science job market, then you will need to demonstrate the proficiency of your skills, especially if you are self-taught through online courses and bootcamps. A project portfolio is a great way to practice your new craft and offer convincing evidence that an employee should hire you over the competition.

    https://www.kdnuggets.com/2021/09/20-machine-learning-projects-hired.html

  • An Introduction to Reinforcement Learning with OpenAI Gym, RLlib, and Google Colab

    Get an Introduction to Reinforcement Learning by attempting to balance a virtual CartPole with OpenAI Gym, RLlib, and Google Colab.

    https://www.kdnuggets.com/2021/09/intro-reinforcement-learning-openai-gym-rllib-colab.html

  • The Prefect Way to Automate & Orchestrate Data Pipelines

    I am migrating all my ETL work from Airflow to this super-cool framework.

    https://www.kdnuggets.com/2021/09/prefect-way-automate-orchestrate-data-pipelines.html

  • Working with Python APIs For Data Science Project

    In this article, we will work with YouTube Python API to collect video statistics from our channel using the requests python library to make an API call and save it as a Pandas DataFrame.

    https://www.kdnuggets.com/2021/09/python-apis-data-science-project.html

  • Smart Ingestion: Using ontology-driven AI

    Imagine data that organizes itself to power your decision-making.

    https://www.kdnuggets.com/2021/09/smart-ingestion-ontology-driven-ai.html

  • Fast AutoML with FLAML + Ray Tune

    Microsoft Researchers have developed FLAML (Fast Lightweight AutoML) which can now utilize Ray Tune for distributed hyperparameter tuning to scale up FLAML’s resource-efficient & easily parallelizable algorithms across a cluster.

    https://www.kdnuggets.com/2021/09/fast-automl-flaml-ray-tune.html

  • How to solve machine learning problems in the real world

    Becoming a machine learning engineer pro is your goal? Sure, online ML courses and Kaggle-style competitions are great resources to learn the basics. However, the daily job of a ML engineer requires an additional layer of skills that you won’t master through these approaches.

    https://www.kdnuggets.com/2021/09/solve-machine-learning-problems-real-world.html

  • Multilabel Document Categorization, step by step example

    This detailed guide explores an unsupervised and supervised learning two-stage approach with LDA and BERT to develop a domain-specific document categorizer on unlabeled documents.

    https://www.kdnuggets.com/2021/08/multilabel-document-categorization.html

  • A Python Data Processing Script Template

    Here's a skeleton general purpose template for getting a Python command line script fleshed out as quickly as possible.

    https://www.kdnuggets.com/2021/08/python-data-processing-script-template.html

  • Data Scientist’s Guide to Efficient Coding in Python

    Read this fantastic collection of tips and tricks the author uses for writing clean code on a day-to-day basis.

    https://www.kdnuggets.com/2021/08/data-scientist-guide-efficient-coding-python.html

  • Querying the Most Granular Demographics Dataset

    Having access to broad and detailed population data can potentially offer enormous value to any organization looking to interact with specific demographics. However, access alone is not sufficient without being able to leverage advanced techniques to explore and visualize the data.

    https://www.kdnuggets.com/2021/08/querying-granular-demographic-dataset.html

  • Using Twitter to Understand Pizza Delivery Apprehension During COVID

    Analyzing customer sentiments and capturing any specific difference in emotion to order Dominos pizza in India during lockdown.

    https://www.kdnuggets.com/2021/08/twitter-understand-pizza-delivery-covid.html

  • Gold BlogMost Common Data Science Interview Questions and Answers">Rewards BlogGold BlogMost Common Data Science Interview Questions and Answers

    After analyzing 900+ data science interview questions from companies over the past few years, the most common data science interview question categories are reviewed in this guide, each explained with an example.

    https://www.kdnuggets.com/2021/08/common-data-science-interview-questions-answers.html

  • Mastering Clustering with a Segmentation Problem

    The one stop shop for implementing the most widely used models in Python for unsupervised clustering.

    https://www.kdnuggets.com/2021/08/mastering-clustering-segmentation-problem.html

  • 30 Most Asked Machine Learning Questions Answered

    There is always a lot to learn in machine learning. Whether you are new to the field or a seasoned practitioner and ready for a refresher, understanding these key concepts will keep your skills honed in the right direction.

    https://www.kdnuggets.com/2021/08/30-machine-learning-questions-answered.html

  • Building Machine Learning Pipelines using Snowflake and Dask

    In this post, I want to share some of the tools that I have been exploring recently and show you how I use them and how they helped improve the efficiency of my workflow. The two I will talk about in particular are Snowflake and Dask. Two very different tools but ones that complement each other well especially as part of the ML Lifecycle.

    https://www.kdnuggets.com/2021/07/building-machine-learning-pipelines-snowflake-dask.html

  • Python Data Structures Compared

    Let's take a look at 5 different Python data structures and see how they could be used to store data we might be processing in our everyday tasks, as well as the relative memory they use for storage and time they take to create and access.

    https://www.kdnuggets.com/2021/07/python-data-structures-compared.html

  • Streamlit Tips, Tricks, and Hacks for Data Scientists

    Today, I am going to talk about a few tips that I learned within more than a year of using Streamlit, that you can also use to unleash your powerful DS/AI/ML (whatever they may be) applications.

    https://www.kdnuggets.com/2021/07/streamlit-tips-tricks-hacks-data-scientists.html

  • 5 Python Data Processing Tips & Code Snippets">Silver Blog5 Python Data Processing Tips & Code Snippets

    This is a small collection of Python code snippets that a beginner might find useful for data processing.

    https://www.kdnuggets.com/2021/07/python-tips-snippets-data-processing.html

  • 10 Mistakes You Should Avoid as a Data Science Beginner

    Read this article on how to gain a competitive advantage in the data science job market.

    https://www.kdnuggets.com/2021/06/10-mistakes-avoid-data-science-beginner.html

  • Building a Knowledge Graph for Job Search Using BERT

    A guide on how to create knowledge graphs using NER and Relation Extraction.

    https://www.kdnuggets.com/2021/06/knowledge-graph-job-search-bert.html

  • PyCaret 101: An introduction for beginners

    This article is a great overview of how to get started with PyCaret for all your machine learning projects.

    https://www.kdnuggets.com/2021/06/pycaret-101-introduction-beginners.html

  • BigQuery vs Snowflake: A Comparison of Data Warehouse Giants

    In this article we are going to compare the two topmost data warehouses: BigQuery and Snowflake.

    https://www.kdnuggets.com/2021/06/bigquery-snowflake-comparison-data-warehouse-giants.html

  • Supercharge Your Machine Learning Experiments with PyCaret and Gradio

    A step-by-step tutorial to develop and interact with machine learning pipelines rapidly.

    https://www.kdnuggets.com/2021/05/supercharge-machine-learning-experiments-pycaret-gradio.html

  • Essential Math for Data Science: Basis and Change of Basis

    In this article, you will learn what the basis of a vector space is, see that any vectors of the space are linear combinations of the basis vectors, and see how to change the basis using change of basis matrices.

    https://www.kdnuggets.com/2021/05/essential-math-data-science-basis-change-basis.html

  • These Soft Skills Can Make or Break Your Data Science Career

    In an industry long ruled by hard skills, the future career success of tomorrow’s data scientists might well depend on their ability to deploy a variety of soft skills into the workplace.

    https://www.kdnuggets.com/2021/05/soft-skills-data-science-career.html

  • Awesome list of datasets in 100+ categories

    With an estimated 44 zettabytes of data in existence in our digital world today and approximately 2.5 quintillion bytes of new data generated daily, there is a lot of data out there you could tap into for your data science projects. It's pretty hard to curate through such a massive universe of data, but this collection is a great start. Here, you can find data from cancer genomes to UFO reports, as well as years of air quality data to 200,000 jokes. Dive into this ocean of data to explore as you learn how to apply data science techniques or leverage your expertise to discover something new.

    https://www.kdnuggets.com/2021/05/awesome-list-datasets.html

  • Platinum BlogVaex: Pandas but 1000x faster">RewardsPlatinum BlogVaex: Pandas but 1000x faster

    If you are working with big data, especially on your local machine, then learning the basics of Vaex, a Python library that enables the fast processing of large datasets, will provide you with a productive alternative to Pandas.

    https://www.kdnuggets.com/2021/05/vaex-pandas-1000x-faster.html

  • The NoSQL Know-It-All Compendium

    Are you a NoSQL beginner, but want to become a NoSQL Know-It-All? Well, this is the place for you. Get up to speed on NoSQL technologies from a beginner's point of view, with this collection of related progressive posts on the subject. NoSQL? No problem!

    https://www.kdnuggets.com/2021/05/nosql-know-it-all-compendium.html

  • FluDemic – using AI and Machine Learning to get ahead of disease

    We are amidst a healthcare data explosion. AI/ML will be more vital than ever in the prevention and handling of future pandemics. Here, we walk you through the different facets of modeling infectious diseases, focusing on influenza and COVID-19.

    https://www.kdnuggets.com/2021/04/fludemic-ai-machine-learning-disease.html

  • How to ace A/B Testing Data Science Interviews">Silver BlogHow to ace A/B Testing Data Science Interviews

    Understanding the process of A/B testing and knowing how to discuss this approach during data science job interviews can give you a leg up over other candidates. This mock interview provides a step-by-step guide through how to demonstrate your mastery of the key concepts and logical considerations.

    https://www.kdnuggets.com/2021/04/ab-testing-data-science-interviews.html

  • How to organize your data science project in 2021">Gold BlogHow to organize your data science project in 2021

    Maintaining proper organization of all your data science projects will increase your productivity, minimize errors, and increase your development efficiency. This tutorial will guide you through a framework on how to keep everything in order on your local machine and in the cloud.

    https://www.kdnuggets.com/2021/04/how-organize-your-data-science-project-2021.html

  • How to Apply Transformers to Any Length of Text

    Read on to find how to restore the power of NLP for long sequences.

    https://www.kdnuggets.com/2021/04/apply-transformers-any-length-text.html

  • The Best Machine Learning Frameworks & Extensions for Scikit-learn">Silver BlogThe Best Machine Learning Frameworks & Extensions for Scikit-learn

    Learn how to use a selection of packages to extend the functionality of Scikit-learn estimators.

    https://www.kdnuggets.com/2021/03/best-machine-learning-frameworks-extensions-scikit-learn.html

  • Must Know for Data Scientists and Data Analysts: Causal Design Patterns">Silver BlogMust Know for Data Scientists and Data Analysts: Causal Design Patterns

    Industry is a prime setting for observational causal inference, but many companies are blind to causal measurement beyond A/B tests. This formula-free primer illustrates analysis design patterns for measuring causal effects from observational data.

    https://www.kdnuggets.com/2021/03/causal-design-patterns.html

  • Document Databases, Explained

    Out of all the NoSQL database types, document-stores are considered the most sophisticated ones. They store data in a JSON format which as opposed to a classic rows and columns structure.

    https://www.kdnuggets.com/2021/03/understanding-nosql-database-types-document.html

  • Silver BlogTop YouTube Channels for Data Science">Platinum BlogSilver BlogTop YouTube Channels for Data Science

    Have a look at the top 15 YouTube channels for data science by number of subscribers, along with some additional data on the channels to help you decide if they may have some content useful for you.

    https://www.kdnuggets.com/2021/03/top-youtube-channels-data-science.html

  • Graph Databases, Explained

    Between the four main NoSQL database types, graph databases are widely appreciated for their application in handling large sets of unstructured data coming from various sources. Let’s talk about how graph databases work and what are their practical uses.

    https://www.kdnuggets.com/2021/02/understanding-nosql-database-types-graph.html

  • Data Science Learning Roadmap for 2021">Gold BlogData Science Learning Roadmap for 2021

    Venturing into the world of Data Science is an exciting, interesting, and rewarding path to consider. There is a great deal to master, and this self-learning recommendation plan will guide you toward establishing a solid understanding of all that is foundational to data science as well as a solid portfolio to showcase your developed expertise.

    https://www.kdnuggets.com/2021/02/data-science-learning-roadmap-2021.html

  • Data Observability, Part II: How to Build Your Own Data Quality Monitors Using SQL

    Using schema and lineage to understand the root cause of your data anomalies.

    https://www.kdnuggets.com/2021/02/data-observability-part-2-build-data-quality-monitors-sql.html

  • An overview of synthetic data types and generation methods

    Synthetic data can be used to test new products and services, validate models, or test performances because it mimics the statistical property of production data. Today you'll find different types of structured and unstructured synthetic data.

    https://www.kdnuggets.com/2021/02/overview-synthetic-data-types-generation-methods.html

  • Data Observability: Building Data Quality Monitors Using SQL

    To trigger an alert when data breaks, data teams can leverage a tried and true tactic from our friends in software engineering: monitoring and observability. In this article, we walk through how you can create your own data quality monitors for freshness and distribution from scratch using SQL.

    https://www.kdnuggets.com/2021/02/data-observability-building-data-quality-monitors-using-sql.html

  • Column-Oriented Databases, Explained

    NoSQL Databases have four distinct types. Key-value stores, document-stores, graph databases, and column-oriented databases. In this article, we’ll explore column-oriented databases, also known simply as “NoSQL columns”.

    https://www.kdnuggets.com/2021/02/understanding-nosql-database-types-column-oriented-databases.html

  • How to Deploy a Flask API in Kubernetes and Connect it with Other Micro-services

    A hands-on tutorial on how to implement your micro-service architecture using the powerful container orchestration tool Kubernetes.

    https://www.kdnuggets.com/2021/02/deploy-flask-api-kubernetes-connect-micro-services.html

  • Essential Math for Data Science: Introduction to Matrices and the Matrix Product">Silver BlogEssential Math for Data Science: Introduction to Matrices and the Matrix Product

    As vectors, matrices are data structures allowing you to organize numbers. They are square or rectangular arrays containing values organized in two dimensions: as rows and columns. You can think of them as a spreadsheet. Learn more here.

    https://www.kdnuggets.com/2021/02/essential-math-data-science-matrices-matrix-product.html

  • Mastering TensorFlow Variables in 5 Easy Steps

    Learn how to use TensorFlow Variables, their differences from plain Tensor objects, and when they are preferred over these Tensor objects | Deep Learning with TensorFlow 2.x.

    https://www.kdnuggets.com/2021/01/mastering-tensorflow-variables-5-easy-steps.html

  • 8 New Tools I Learned as a Data Scientist in 2020

    The author shares the data science tools learned while making the move from Docker to Live Deployments.

    https://www.kdnuggets.com/2021/01/8-new-tools-learned-data-scientist-2020.html

  • Data Catalogs Are Dead; Long Live Data Discovery

    Why data catalogs aren’t meeting the needs of the modern data stack, and how a new approach – data discovery – is needed to better facilitate metadata management and data reliability.

    https://www.kdnuggets.com/2020/12/data-catalogs-dead-long-live-data-discovery.html

  • Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance

    A practical deep dive on production monitoring architectures for machine learning at scale using real-time metrics, outlier detectors, drift detectors, metrics servers and explainers.

    https://www.kdnuggets.com/2020/12/production-machine-learning-monitoring-outliers-drift-explainers-statistical-performance.html

  • Industry 2021 Predictions for AI, Analytics, Data Science, Machine Learning

    We bring you industry predictions from 12 innovative companies - what key trends they expect in 2021 in AI, Analytics, Data Science, and Machine Learning?

    https://www.kdnuggets.com/2020/12/industry-2021-predictions-ai-data-science-machine-learning.html

  • 20 Core Data Science Concepts for Beginners">Platinum Blog20 Core Data Science Concepts for Beginners

    With so much to learn and so many advancements to follow in the field of data science, there are a core set of foundational concepts that remain essential. Twenty of these ideas are highlighted here that are key to review when preparing for a job interview or just to refresh your appreciation of the basics.

    https://www.kdnuggets.com/2020/12/20-core-data-science-concepts-beginners.html

  • NoSQL for Beginners">Silver BlogNoSQL for Beginners

    NoSQL can offer an advantage to those who are entering Data Science and Analytics, as well as having applications with high-performance needs that aren’t met by traditional SQL databases.

    https://www.kdnuggets.com/2020/12/nosql-beginners.html

  • Object-Oriented Programming Explained Simply for Data Scientists">Gold BlogObject-Oriented Programming Explained Simply for Data Scientists

    Read this simple but effective guide to start using Classes in Python 3.

    https://www.kdnuggets.com/2020/12/object-oriented-programming-explained-simply-data-scientists.html

  • Data Science History and Overview

    In this era of big data that is only getting bigger, a huge amount of information from different fields is gathered and stored. Its analysis and extraction of value have become one of the most attractive tasks for companies and society in general, which is harnessed by the new professional role of the Data Scientist.

    https://www.kdnuggets.com/2020/11/data-science-history-overview.html

  • Facebook Open Sourced New Frameworks to Advance Deep Learning Research">Silver BlogFacebook Open Sourced New Frameworks to Advance Deep Learning Research

    Polygames, PyTorch3D and HiPlot are the new additions to Facebook’s open source deep learning stack.

    https://www.kdnuggets.com/2020/11/facebook-open-source-frameworks-advance-deep-learning-research.html

  • How to Acquire the Most Wanted Data Science Skills">Gold BlogHow to Acquire the Most Wanted Data Science Skills

    We recently surveyed KDnuggets readers to determine the "most wanted" data science skills. Since they seem to be those most in demand from practitioners, here is a collection of resources for getting started with this learning.

    https://www.kdnuggets.com/2020/11/acquire-most-wanted-data-science-skills.html

  • Mastering TensorFlow Tensors in 5 Easy Steps

    Discover how the building blocks of TensorFlow works at the lower level and learn how to make the most of Tensor objects.

    https://www.kdnuggets.com/2020/11/mastering-tensorflow-tensors-5-easy-steps.html

  • 2 Coding-free Ways to Extract Content From Websites to Boost Web Traffic

    There are 2 main coding-free solutions for extracting content from websites to build your content base: use web scraping tools and use content aggregation tools. We review top choices.

    https://www.kdnuggets.com/2020/11/octoparse-coding-free-extract-content.html

  • How to Build a Football Dataset with Web Scraping

    This article covers using Selenium to scrape JavaScript rendered content.

    https://www.kdnuggets.com/2020/11/build-football-dataset-web-scraping.html

  • Platinum BlogThe Best Data Science Certification You’ve Never Heard Of">Silver BlogPlatinum BlogThe Best Data Science Certification You’ve Never Heard Of

    The CDMP is the best data strategy certification you’ve never heard of. (And honestly, when you consider the fact that you’re probably working a job that didn’t exist ten years ago, it’s not surprising that this certification isn’t widespread just yet.)

    https://www.kdnuggets.com/2020/11/best-data-science-certification-never-heard.html

  • Building Deep Learning Projects with fastai — From Model Training to Deployment

    A getting started guide to develop computer vision application with fastai.

    https://www.kdnuggets.com/2020/11/building-deep-learning-projects-fastai-model-training-deployment.html

  • How to become a Data Scientist: a step-by-step guide">Gold BlogHow to become a Data Scientist: a step-by-step guide

    Data science is everywhere. But what are the best ways to learn the field well enough to enter the profession? Read on for some tips and steps on doing so, and some great courses to help you get there.

    https://www.kdnuggets.com/2020/10/greatlearning-become-data-scientist-guide.html

  • Software 2.0 takes shape

    Software developers remain in very high demand as many organizations continue to experience workloads that far exceed available talent. AI-enhanced approaches that automate more areas of the software development lifecycle are in development with interesting potentials for how machine learning and natural language processing can significantly impact how software is designed, developed, tested, and deployed in the future.

    https://www.kdnuggets.com/2020/10/software-20-takes-shape.html

  • 10 Underrated Python Skills

    Tips for feature analysis, hyperparameter tuning, data visualization and more.

    https://www.kdnuggets.com/2020/10/10-underrated-python-skills.html

  • Platinum Blogfastcore: An Underrated Python Library">Silver BlogPlatinum Blogfastcore: An Underrated Python Library

    A unique python library that extends the python programming language and provides utilities that enhance productivity.

    https://www.kdnuggets.com/2020/10/fastcore-underrated-python-library.html

  • Strategies of Docker Images Optimization

    Large Docker images lengthen the time it takes to build and share images between clusters and cloud providers. When creating applications, it’s therefore worth optimizing Docker Images and Dockerfiles to help teams share smaller images, improve performance, and debug problems.

    https://www.kdnuggets.com/2020/10/strategies-docker-images-optimization.html

  • The Insiders’ Guide to Generative and Discriminative Machine Learning Models

    In this article, we will look at the difference between generative and discriminative models, how they contrast, and one another.

    https://www.kdnuggets.com/2020/09/insiders-guide-generative-discriminative-machine-learning-models.html

  • How to Effectively Obtain Consumer Insights in a Data Overload Era

    Everybody knows how important is understanding your customer, but how to do that in an era of Information Overload?

    https://www.kdnuggets.com/2020/09/effectively-obtain-consumer-insights-data-overload-era.html

  • 8 AI/Machine Learning Projects To Make Your Portfolio Stand Out">Silver Blog8 AI/Machine Learning Projects To Make Your Portfolio Stand Out

    If you are just starting down a path toward a career in Data Science, or you are already a seasoned practitioner, then keeping active to advance your experience through side projects is invaluable to take you to the next professional level. These eight interesting project ideas with source code and reference articles will jump start you to thinking outside of the box.

    https://www.kdnuggets.com/2020/09/8-ml-ai-projects-stand-out.html

  • 10 Use Cases for Privacy-Preserving Synthetic Data

    This article presents 10 use-cases for synthetic data, showing how enterprises today can use this artificially generated information to train machine learning models or share data externally without violating individuals' privacy.

    https://www.kdnuggets.com/2020/08/10-use-cases-privacy-preserving-synthetic-data.html

  • How A Single Source of Truth Can Benefit Your Organization

    A single source of truth provides stakeholders with a clear picture of the enterprise assets and the potential complications that can disrupt the data strategy. Find out how you can implement this single source of truth in your enterprise ecosystem.

    https://www.kdnuggets.com/2020/08/single-source-truth-benefit-organization.html

  • Fuzzy Joins in Python with d6tjoin

    Combining different data sources is a time suck! d6tjoin is a python library that lets you join pandas dataframes quickly and efficiently.

    https://www.kdnuggets.com/2020/07/fuzzy-joins-python-d6tjoin.html

  • A Tour of End-to-End Machine Learning Platforms

    An end-to-end machine learning platform needs a holistic approach. If you’re interested in learning more about a few well-known ML platforms, you’ve come to the right place!

    https://www.kdnuggets.com/2020/07/tour-end-to-end-machine-learning-platforms.html

  • Recommender Systems in a Nutshell

    Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about recommender systems and the ways they are used.

    https://www.kdnuggets.com/2020/07/recommender-systems-nutshell.html

  • Powerful CSV processing with kdb+

    This article provides a glimpse into the available tools to work with CSV files and describes how kdb+ and its query language q raise CSV processing to a new level of performance and simplicity.

    https://www.kdnuggets.com/2020/07/powerful-csv-processing-kdb.html

  • Building a REST API with Tensorflow Serving (Part 1)

    Part one of a tutorial to teach you how to build a REST API around functions or saved models created in Tensorflow. With Tensorflow Serving and Docker, defining endpoint URLs and sending HTTP requests is simple.

    https://www.kdnuggets.com/2020/07/building-rest-api-tensorflow-serving-part-1.html

  • 7 Signs you are data literate

    Understanding data is key to being a Data Scientist. But, how can you know if you might be a good fit for the field when you haven't worked with much data? These telltale signs will suggest you are competent to work with data, and that you might have a talent for being data literate.

    https://www.kdnuggets.com/2020/07/7-signs-data-literate.html

  • A Layman’s Guide to Data Science. Part 3: Data Science Workflow">Gold BlogA Layman’s Guide to Data Science. Part 3: Data Science Workflow

    Learn and appreciate the typical workflow for a data science project, including data preparation (extraction, cleaning, and understanding), analysis (modeling), reflection (finding new paths), and communication of the results to others.

    https://www.kdnuggets.com/2020/07/laymans-guide-data-science-workflow.html

  • Generating cooking recipes using TensorFlow and LSTM Recurrent Neural Network: A step-by-step guide

    A character-level LSTM (Long short-term memory) RNN (Recurrent Neural Network) is trained on ~100k recipes dataset using TensorFlow. The model suggested the recipes "Cream Soda with Onions", "Puff Pastry Strawberry Soup", "Zucchini flavor Tea", and "Salmon Mousse of Beef and Stilton Salad with Jalapenos". Yum!? Follow along this detailed guide with code to create your own recipe-generating chef.

    https://www.kdnuggets.com/2020/07/generating-cooking-recipes-using-tensorflow.html

  • How to Prepare Your Data

    This is an overview of structuring, cleaning, and enriching raw data.

    https://www.kdnuggets.com/2020/06/how-prepare-your-data.html

  • The Unreasonable Progress of Deep Neural Networks in Natural Language Processing (NLP)

    Natural language processing has made incredible advances through advanced techniques in deep learning. Learn about these powerful models, and find how close (or far away) these approaches are to human-level understanding.

    https://www.kdnuggets.com/2020/06/unreasonable-progress-deep-neural-networks-nlp.html

  • The Most Important Fundamentals of PyTorch you Should Know">Silver BlogThe Most Important Fundamentals of PyTorch you Should Know

    PyTorch is a constantly developing deep learning framework with many exciting additions and features. We review its basic elements and show an example of building a simple Deep Neural Network (DNN) step-by-step.

    https://www.kdnuggets.com/2020/06/fundamentals-pytorch.html

  • Introduction to Convolutional Neural Networks

    The article focuses on explaining key components in CNN and its implementation using Keras python library.

    https://www.kdnuggets.com/2020/06/introduction-convolutional-neural-networks.html

  • Deepmind’s Gaming Streak: The Rise of AI Dominance

    There is still a long way to go before machine agents match overall human gaming prowess, but Deepmind’s gaming research focus has shown a clear progression of substantial progress.

    https://www.kdnuggets.com/2020/05/deepmind-gaming-ai-dominance.html

  • Faster machine learning on larger graphs with NumPy and Pandas

    One of the most exciting features of StellarGraph 1.0 is a new graph data structure — built using NumPy and Pandas — that results in significantly lower memory usage and faster construction times.

    https://www.kdnuggets.com/2020/05/faster-machine-learning-larger-graphs-numpy-pandas.html

  • AI and Machine Learning for Healthcare">Gold BlogAI and Machine Learning for Healthcare

    Traditional business and technology sectors are not the only fields being impacted by AI. Healthcare is a field that is thought to be highly suitable for the applications of AI tools and techniques.

    https://www.kdnuggets.com/2020/05/ai-machine-learning-healthcare.html

  • I Designed My Own Machine Learning and AI Degree

    With so many pioneering online resources for open education, check out this organized collection of courses you can follow to become a well-rounded machine learning and AI engineer.

    https://www.kdnuggets.com/2020/05/designed-machine-learning-ai-degree.html

  • What You Need to Know About Deep Reinforcement Learning

    How does deep learning solve the challenges of scale and complexity in reinforcement learning? Learn how combining these approaches will make more progress toward the notion of Artificial General Intelligence.

    https://www.kdnuggets.com/2020/05/deep-reinforcement-learning.html

  • The Architecture Used at LinkedIn to Improve Feature Management in Machine Learning Models

    The new typed feature schema streamlined the reusability of features across thousands of machine learning models.

    https://www.kdnuggets.com/2020/05/architecture-linkedin-feature-management-machine-learning-models.html

  • Getting Started with Spectral Clustering

    This post will unravel a practical example to illustrate and motivate the intuition behind each step of the spectral clustering algorithm.

    https://www.kdnuggets.com/2020/05/getting-started-spectral-clustering.html

  • Announcing PyCaret 1.0.0

    An open source low-code machine learning library in Python. PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient.

    https://www.kdnuggets.com/2020/04/announcing-pycaret.html

  • The Benefits & Examples of Using Apache Spark with PySpark

    Apache Spark runs fast, offers robust, distributed, fault-tolerant data objects, and integrates beautifully with the world of machine learning and graph analytics. Learn more here.

    https://www.kdnuggets.com/2020/04/benefits-apache-spark-pyspark.html

  • Top Process Mining Software Companies, Updated

    Understanding the real business processes of a company through analysis of its information systems can guide digital transformations. Here, the top 10 process mining software companies are reviewed that can assist businesses in process optimizations through unique insights of business systems.

    https://www.kdnuggets.com/2020/04/process-mining-software-companies.html

  • A Layman’s Guide to Data Science. Part 2: How to Build a Data Project

    As Part 2 in a Guide to Data Science, we outline the steps to build your first Data Science project, including how to ask good questions to understand the data first, how to prepare the data, how to develop an MVP, reiterate to build a good product, and, finally, present your project.

    https://www.kdnuggets.com/2020/04/guide-data-science-build-data-project.html

  • Python for data analysis… is it really that simple?!?">Silver BlogPython for data analysis… is it really that simple?!?

    The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations.

    https://www.kdnuggets.com/2020/04/python-data-analysis-really-that-simple.html

  • Brain Tumor Detection using Mask R-CNN

    Mask R-CNN has been the new state of the art in terms of instance segmentation. Here I want to share some simple understanding of it to give you a first look and then we can move ahead and build our model.

    https://www.kdnuggets.com/2020/03/brain-tumor-detection-mask-r-cnn.html

  • Deep Learning Breakthrough: a sub-linear deep learning algorithm that does not need a GPU?

    Deep Learning sits at the forefront of many important advances underway in machine learning. With backpropagation being a primary training method, its computational inefficiencies require sophisticated hardware, such as GPUs. Learn about this recent breakthrough algorithmic advancement with improvements to the backpropgation calculations on a CPU that outperforms large neural network training with a GPU.

    https://www.kdnuggets.com/2020/03/deep-learning-breakthrough-sub-linear-algorithm-no-gpu.html

  • Evaluating Ray: Distributed Python for Massive Scalability

    If your team has started using ​Ray​ and you’re wondering what it is, this post is for you. If you’re wondering if Ray should be part of your technical strategy for Python-based applications, especially ML and AI, this post is for you.

    https://www.kdnuggets.com/2020/03/domino-ray-distributed-python-massive-scalability.html

  • Scaling Your Data Strategy

    This article presents a particular vision for a cohesive data strategy for addressing large-scale problems with data-driven solutions, based on prior professional experiences.

    https://www.kdnuggets.com/2020/03/scaling-data-strategy.html

  • Building a Mature Machine Learning Team

    After spending a lot of time thinking about the paths that software companies take toward ML maturity, this framework was created to follow as you adopt ML and then mature as an organization. The framework covers every aspect of building a team including product, process, technical, and organizational readiness, as well as recognizes the importance of cross-functional expertise and process improvements for bringing AI-driven products to market.

    https://www.kdnuggets.com/2020/03/mature-machine-learning-team.html

  • How To Build Your Own Feedback Analysis Solution

    Automating the analysis of customer feedback will sound like a great idea after reading a couple hundred reviews. Building an NLP solution to provide in-depth analysis of what your customers are thinking is a serious undertaking, and this guide helps you scope out the entire project.

    https://www.kdnuggets.com/2020/03/build-feedback-analysis-solution.html

  • Software Interfaces for Machine Learning Deployment

    While building a machine learning model might be the fun part, it won't do much for anyone else unless it can be deployed into a production environment. How to implement machine learning deployments is a special challenge with differences from traditional software engineering, and this post examines a fundamental first step -- how to create software interfaces so you can develop deployments that are automated and repeatable.

    https://www.kdnuggets.com/2020/03/software-interfaces-machine-learning-deployment.html

  • Can Edge Analytics Become a Game Changer?

    Edge analytics is considered to be the future of sensor handling, and this article discusses its benefits and architecture of modern edge devices, gateways, and sensors. Deep Learning for edge analytics is also considered along with a review of experiments in human and chess figure detection using edge devices.

    https://www.kdnuggets.com/2020/02/edge-analytics-game-changer.html

  • What Does it Mean to Deploy a Machine Learning Model?

    You are a Data Scientist who knows how to develop machine learning models. You might also be a Data Scientist who is too afraid to ask how to deploy your machine learning models. The answer isn't entirely straightforward, and so is a major pain point of the community. This article will help you take a step in the right direction for production deployments that are automated, reproducible, and auditable.

    https://www.kdnuggets.com/2020/02/deploy-machine-learning-model.html

  • Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau

    When reviewing geographical data, it can be difficult to prepare the data for an analysis. This article helps by covering importing data into a SQL Server database; cleansing and grouping data into a map grid; adding time data points to the set of grid data and filling in the gaps where no crimes occurred; importing the data into R; running XGBoost model to determine where crimes will occur on a specific day

    https://www.kdnuggets.com/2020/02/introduction-geographical-time-series-crime-r-sql-tableau.html

  • Illustrating the Reformer

    In this post, we will try to dive into the Reformer model and try to understand it with some visual guides.

    https://www.kdnuggets.com/2020/02/illustrating-reformer.html

  • Schema Evolution in Data Lakes

    Whereas a data warehouse will need rigid data modeling and definitions, a data lake can store different types and shapes of data. In a data lake, the schema of the data can be inferred when it’s read, providing the aforementioned flexibility. However, this flexibility is a double-edged sword.

    https://www.kdnuggets.com/2020/01/schema-evolution-data-lakes.html

  • Automated Machine Learning: How do teams work together on an AutoML project?">Gold BlogAutomated Machine Learning: How do teams work together on an AutoML project?

    In this use case, available to the public on GitHub, we’ll see how a data scientist, project manager, and business lead at a retail grocer can leverage automated machine learning and Azure Machine Learning service to reduce product overstock.

    https://www.kdnuggets.com/2020/01/teams-work-together-automl-project.html

  • Market Basket Analysis: A Tutorial

    This article is about Market Basket Analysis & the Apriori algorithm that works behind it.

    https://www.kdnuggets.com/2019/12/market-basket-analysis.html

Refine your search here:

No, thanks!