Search results for dataframe

    Found 520 documents, 5944 searched:

  • Data Science Books You Should Start Reading in 2021">Gold BlogData Science Books You Should Start Reading in 2021

    Check out this curated list of the best data science books for any level.

    https://www.kdnuggets.com/2021/04/data-science-books-start-reading-2021.html

  • 10 Must-Know Statistical Concepts for Data Scientists

    Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.

    https://www.kdnuggets.com/2021/04/10-statistical-concepts-data-scientists.html

  • Time Series Forecasting with PyCaret Regression Module

    PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use PyCaret's Regression Module for Time Series Forecasting.

    https://www.kdnuggets.com/2021/04/time-series-forecasting-pycaret-regression-module.html

  • E-commerce Data Analysis for Sales Strategy Using Python

    Check out this informative and concise case study applying data analysis using Python to a well-defined e-commerce scenario.

    https://www.kdnuggets.com/2021/04/e-commerce-data-analysis-sales-strategy-python.html

  • Awesome Tricks And Best Practices From Kaggle">Gold BlogAwesome Tricks And Best Practices From Kaggle

    Easily learn what is only learned by hours of search and exploration.

    https://www.kdnuggets.com/2021/04/awesome-tricks-best-practices-kaggle.html

  • Shapash: Making Machine Learning Models Understandable">Gold BlogShapash: Making Machine Learning Models Understandable

    Establishing an expectation for trust around AI technologies may soon become one of the most important skills provided by Data Scientists. Significant research investments are underway in this area, and new tools are being developed, such as Shapash, an open-source Python library that helps Data Scientists make machine learning models more transparent and understandable.

    https://www.kdnuggets.com/2021/04/shapash-machine-learning-models-understandable.html

  • Platinum BlogTop 10 Python Libraries Data Scientists should know in 2021">Gold BlogPlatinum BlogTop 10 Python Libraries Data Scientists should know in 2021

    So many Python libraries exist that offer powerful and efficient foundations for supporting your data science work and machine learning model development. While the list may seem overwhelming, there are certain libraries you should focus your time on, as they are some of the most commonly used today.

    https://www.kdnuggets.com/2021/03/top-10-python-libraries-2021.html

  • The Best Machine Learning Frameworks & Extensions for Scikit-learn">Silver BlogThe Best Machine Learning Frameworks & Extensions for Scikit-learn

    Learn how to use a selection of packages to extend the functionality of Scikit-learn estimators.

    https://www.kdnuggets.com/2021/03/best-machine-learning-frameworks-extensions-scikit-learn.html

  • Learning from machine learning mistakes

    Read this article and discover how to find weak spots of a regression model.

    https://www.kdnuggets.com/2021/03/learning-from-machine-learning-mistakes.html

  • Know your data much faster with the new Sweetviz Python library">Gold BlogKnow your data much faster with the new Sweetviz Python library

    One of the latest exploratory data analysis libraries is a new open-source Python library called Sweetviz, for just the purposes of finding out data types, missing information, distribution of values, correlations, etc. Find out more about the library and how to use it here.

    https://www.kdnuggets.com/2021/03/know-your-data-much-faster-sweetviz-python-library.html

  • How to Speed Up Pandas with Modin

    The Modin library has the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters. This tutorial goes over how to get started with Modin and how it can speed up your pandas workflows.

    https://www.kdnuggets.com/2021/03/speed-up-pandas-modin.html

  • 11 Essential Code Blocks for Complete EDA (Exploratory Data Analysis)

    This article is a practical guide to exploring any data science project and gain valuable insights.

    https://www.kdnuggets.com/2021/03/11-essential-code-blocks-exploratory-data-analysis.html

  • Dask and Pandas: No Such Thing as Too Much Data

    Do you love pandas, but don't love it when you reach the limits of your memory or compute resources? Dask provides you with the option to use the pandas API with distributed data and computing. Learn how it works, how to use it, and why it’s worth the switch when you need it most.

    https://www.kdnuggets.com/2021/03/dask-pandas-data.html

  • 15 common mistakes data scientists make in Python (and how to fix them)

    Writing Python code that works for your data science project and performs the task you expect is one thing. Ensuring your code is readable by others (including your future self), reproducible, and efficient are entirely different challenges that can be addressed by minimizing common bad practices in your development.

    https://www.kdnuggets.com/2021/03/15-common-mistakes-python.html

  • Platinum BlogAre You Still Using Pandas to Process Big Data in 2021? Here are two better options">Gold BlogPlatinum BlogAre You Still Using Pandas to Process Big Data in 2021? Here are two better options

    When its time to handle a lot of data -- so much that you are in the realm of Big Data -- what tools can you use to wrangle the data, especially in a notebook environment? Pandas doesn’t handle really Big Data very well, but two other libraries do. So, which one is better and faster?

    https://www.kdnuggets.com/2021/03/pandas-big-data-better-options.html

  • Data Science Learning Roadmap for 2021">Gold BlogData Science Learning Roadmap for 2021

    Venturing into the world of Data Science is an exciting, interesting, and rewarding path to consider. There is a great deal to master, and this self-learning recommendation plan will guide you toward establishing a solid understanding of all that is foundational to data science as well as a solid portfolio to showcase your developed expertise.

    https://www.kdnuggets.com/2021/02/data-science-learning-roadmap-2021.html

  • Pandas Profiling: One-Line Magical Code for EDA

    EDA can be automated using a Python library called Pandas Profiling. Let’s explore Pandas profiling to do EDA in a very short time and with just a single line code.

    https://www.kdnuggets.com/2021/02/pandas-profiling-one-line-magical-code-eda.html

  • Powerful Exploratory Data Analysis in just two lines of code">Gold BlogPowerful Exploratory Data Analysis in just two lines of code

    EDA is a fundamental early process for any Data Science investigation. Typical approaches for visualization and exploration are powerful, but can be cumbersome for getting to the heart of your data. Now, you can get to know your data much faster with only a few lines of code... and it might even be fun!

    https://www.kdnuggets.com/2021/02/powerful-exploratory-data-analysis-sweetviz.html

  • 7 Most Recommended Skills to Learn to be a Data Scientist

    The Data Scientist professional has emerged as a true interdisciplinary role that spans a variety of skills, theoretical and practical. For the core, day-to-day activities, many critical requirements that enable the delivery of real business value reach well outside the realm of machine learning, and should be mastered by those aspiring to the field.

    https://www.kdnuggets.com/2021/02/7-most-recommended-skills-data-scientist.html

  • The Ultimate Scikit-Learn Machine Learning Cheatsheet">Gold BlogThe Ultimate Scikit-Learn Machine Learning Cheatsheet

    With the power and popularity of the scikit-learn for machine learning in Python, this library is a foundation to any practitioner's toolset. Preview its core methods with this review of predictive modelling, clustering, dimensionality reduction, feature importance, and data transformation.

    https://www.kdnuggets.com/2021/01/ultimate-scikit-learn-machine-learning-cheatsheet.html

  • Can Data Science Be Agile? Implementing Best Agile Practices to Your Data Science Process

    Agile is not reserved for software developers only -- that's a myth. While these effective strategies are not commonly used by data scientists today and some aspects of data science make Agile a bit tricky, the methodology offers plenty of benefits to data science projects that can increase the effectiveness of your process and bring more success to your outcomes.

    https://www.kdnuggets.com/2021/01/data-science-agile-best-practices.html

  • Cleaner Data Analysis with Pandas Using Pipes">Silver BlogCleaner Data Analysis with Pandas Using Pipes

    Check out this practical guide on Pandas pipes.

    https://www.kdnuggets.com/2021/01/cleaner-data-analysis-pandas-pipes.html

  • Model Experiments, Tracking and Registration using MLflow on Databricks

    This post covers how StreamSets can help expedite operations at some of the most crucial stages of Machine Learning Lifecycle and MLOps, and demonstrates integration with Databricks and MLflow.

    https://www.kdnuggets.com/2021/01/model-experiments-tracking-registration-mlflow-databricks.html

  • Meet whale! The stupidly simple data discovery tool">Gold BlogMeet whale! The stupidly simple data discovery tool

    Finding data and understanding its meaning represents the traditional "daily grind" of a Data Scientist. With whale, the new lightweight data discovery, documentation, and quality engine for your data warehouse that is under development by Dataframe, your data science team will more efficiently search data and automate its data metrics.

    https://www.kdnuggets.com/2020/12/whale-data-discovery-tool.html

  • How to easily check if your Machine Learning model is fair?

    Machine learning models deployed today -- as will many more in the future -- impact people and society directly. With that power and influence resting in the hands of Data Scientists and machine learning engineers, taking the time to evaluate and understand if model results are fair will become the linchpin for the future success of AI/ML solutions. These are critical considerations, and using a recently developed fairness module in the dalex Python package is a unified and accessible way to ensure your models remain fair.

    https://www.kdnuggets.com/2020/12/machine-learning-model-fair.html

  • A Rising Library Beating Pandas in Performance">Gold BlogA Rising Library Beating Pandas in Performance

    This article compares the performance of the well-known pandas library with pypolars, a rising DataFrame library written in Rust. See how they compare.

    https://www.kdnuggets.com/2020/12/rising-library-beating-pandas-performance.html

  • 10 Python Skills They Don’t Teach in Bootcamp

    Ascend to new heights in Data Science and Machine Learning with this thrilling list of coding tips.

    https://www.kdnuggets.com/2020/12/10-python-skills-dont-teach-bootcamp.html

  • Data Compression via Dimensionality Reduction: 3 Main Methods

    Lift the curse of dimensionality by mastering the application of three important techniques that will help you reduce the dimensionality of your data, even if it is not linearly separable.

    https://www.kdnuggets.com/2020/12/data-compression-dimensionality-reduction.html

  • R or Python? Why Not Both?">Silver BlogR or Python? Why Not Both?

    Do you use both R and Python, either in different projects or in the same? Check out prython, an IDE designed to handle your needs.

    https://www.kdnuggets.com/2020/12/r-python-both-prython.html

  • Top November Stories: Top Python Libraries for Data Science, Data Visualization & Machine Learning; The Best Data Science Certification You’ve Never Heard Of

    Also: TabPy: Combining Python and Tableau; How to Acquire the Most Wanted Data Science Skills.

    https://www.kdnuggets.com/2020/12/top-stories-2020-nov.html

  • 10 Python Skills for Beginners

    Python is the fastest growing, most-beloved programming language. Get started with these Data Science tips.

    https://www.kdnuggets.com/2020/12/10-python-skills-beginners.html

  • Simple & Intuitive Ensemble Learning in R

    Read about metaEnsembleR, an R package for heterogeneous ensemble meta-learning (classification and regression) that is fully-automated.

    https://www.kdnuggets.com/2020/12/simple-intuitive-meta-learning-r.html

  • How to Incorporate Tabular Data with HuggingFace Transformers

    In real-world scenarios, we often encounter data that includes text and tabular features. Leveraging the latest advances for transformers, effectively handling situations with both data structures can increase performance in your models.

    https://www.kdnuggets.com/2020/11/tabular-data-huggingface-transformers.html

  • 5 Things You Are Doing Wrong in PyCaret

    PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. Find out 5 ways to improve your usage of the library.

    https://www.kdnuggets.com/2020/11/5-things-doing-wrong-pycaret.html

  • Mastering TensorFlow Tensors in 5 Easy Steps

    Discover how the building blocks of TensorFlow works at the lower level and learn how to make the most of Tensor objects.

    https://www.kdnuggets.com/2020/11/mastering-tensorflow-tensors-5-easy-steps.html

  • How to Build a Football Dataset with Web Scraping

    This article covers using Selenium to scrape JavaScript rendered content.

    https://www.kdnuggets.com/2020/11/build-football-dataset-web-scraping.html

  • 10 Underrated Python Skills

    Tips for feature analysis, hyperparameter tuning, data visualization and more.

    https://www.kdnuggets.com/2020/10/10-underrated-python-skills.html

  • Data Science in the Cloud with Dask

    Scaling large data analyses for data science and machine learning is growing in importance. Dask and Coiled are making it easy and fast for folks to do just that. Read on to find out how.

    https://www.kdnuggets.com/2020/10/data-science-cloud-dask.html

  • Feature Ranking with Recursive Feature Elimination in Scikit-Learn

    This article covers using scikit-learn to obtain the optimal number of features for your machine learning project.

    https://www.kdnuggets.com/2020/10/feature-ranking-recursive-feature-elimination-scikit-learn.html

  • Platinum Blogfastcore: An Underrated Python Library">Silver BlogPlatinum Blogfastcore: An Underrated Python Library

    A unique python library that extends the python programming language and provides utilities that enhance productivity.

    https://www.kdnuggets.com/2020/10/fastcore-underrated-python-library.html

  • Getting Started with PyTorch

    A practical walkthrough on how to use PyTorch for data analysis and inference.

    https://www.kdnuggets.com/2020/10/getting-started-pytorch.html

  • Uber Open Sources the Third Release of Ludwig, its Code-Free Machine Learning Platform

    The new release makes Ludwig one of the most complete open source AutoML stacks in the market.

    https://www.kdnuggets.com/2020/10/uber-open-source-ludwig-code-free-machine-learning-platform.html

  • Your Guide to Linear Regression Models

    This article explains linear regression and how to program linear regression models in Python.

    https://www.kdnuggets.com/2020/10/guide-linear-regression-models.html

  • Geographical Plots with Python">Silver BlogGeographical Plots with Python

    When your data includes geographical information, rich map visualizations can offer significant value for you to understand your data and for the end user when interpreting analytical results.

    https://www.kdnuggets.com/2020/09/geographical-plots-python.html

  • Introduction to Time Series Analysis in Python">Gold BlogIntroduction to Time Series Analysis in Python

    Data that is updated in real-time requires additional handling and special care to prepare it for machine learning models. The important Python library, Pandas, can be used for most of this work, and this tutorial guides you through this process for analyzing time-series data.

    https://www.kdnuggets.com/2020/09/introduction-time-series-analysis-python.html

  • Statistical and Visual Exploratory Data Analysis with One Line of Code

    If EDA is not executed correctly, it can cause us to start modeling with “unclean” data. See how to use Pandas Profiling to perform EDA with a single line of code.

    https://www.kdnuggets.com/2020/09/statistical-visual-exploratory-data-analysis-one-line-code.html

  • Visualization Of COVID-19 New Cases Over Time In Python

    Inspired by another concise data visualization, the author of this article has crafted and shared the code for a heatmap which visualizes the COVID-19 pandemic in the United States over time.

    https://www.kdnuggets.com/2020/09/visualization-covid-19-new-cases-over-time-python.html

  • Feature Engineering for Numerical Data

    Data feeds machine learning models, and the more the better, right? Well, sometimes numerical data isn't quite right for ingestion, so a variety of methods, detailed in this article, are available to transform raw numbers into something a bit more palatable.

    https://www.kdnuggets.com/2020/09/feature-engineering-numerical-data.html

  • An Introduction to NLP and 5 Tips for Raising Your Game

    This article is a collection of things the author would like to have known when they started out in NLP. Perhaps it will be useful for you.

    https://www.kdnuggets.com/2020/09/introduction-nlp-5-tips-raising-your-game.html

  • Microsoft’s DoWhy is a Cool Framework for Causal Inference

    Inspired by Judea Pearl’s do-calculus for causal inference, the open source framework provides a programmatic interface for popular causal inference methods.

    https://www.kdnuggets.com/2020/08/microsoft-dowhy-framework-causal-inference.html

  • Explainable and Reproducible Machine Learning Model Development with DALEX and Neptune

    With ML models serving real people, misclassified cases (which are a natural consequence of using ML) are affecting peoples’ lives and sometimes treating them very unfairly. It makes the ability to explain your models’ predictions a requirement rather than just a nice to have.

    https://www.kdnuggets.com/2020/08/explainable-reproducible-machine-learning-model-development-dalex-neptune.html

  • Working with Spark, Python or SQL on Azure Databricks

    Here we look at some ways to interchangeably work with Python, PySpark and SQL using Azure Databricks, an Apache Spark-based big data analytics service designed for data science and data engineering offered by Microsoft.

    https://www.kdnuggets.com/2020/08/spark-python-sql-azure-databricks.html

  • Getting Started with Feature Selection

    For machine learning, more data is always better. What about more features of data? Not necessarily. This beginners' guide with code examples for selecting the most useful features from your data will jump start you toward developing the most effective and efficient learning models.

    https://www.kdnuggets.com/2020/08/getting-started-feature-selection.html

  • Build Your Own AutoML Using PyCaret 2.0

    In this post we present a step-by-step tutorial on how PyCaret can be used to build an Automated Machine Learning Solution within Power BI, thus allowing data scientists and analysts to add a layer of machine learning to their Dashboards without any additional license or software costs.

    https://www.kdnuggets.com/2020/08/build-automl-pycaret.html

  • GitHub is the Best AutoML You Will Ever Need

    This article uses PyCaret 2.0, an open source, low-code machine learning library in Python to develop a simple AutoML solution and deploy it as a Docker container using GitHub actions.

    https://www.kdnuggets.com/2020/08/github-best-automl-ever-need.html

  • Data Science Internship Interview Questions

    Data science is an attractive field because not only is it lucrative, but you can have opportunities to work on interesting projects, and you’re always learning new things. If you're trying to get started from the ground up, then review this guide to prepare for the interview essentials.

    https://www.kdnuggets.com/2020/08/data-science-internship-interview-questions.html

  • Word Embedding Fairness Evaluation

    With word embeddings being such a crucial component of NLP, the reported social biases resulting from the training corpora could limit their application. The framework introduced here intends to measure the fairness in word embeddings to better understand these potential biases.

    https://www.kdnuggets.com/2020/08/word-embedding-fairness-evaluation.html

  • Fuzzy Joins in Python with d6tjoin

    Combining different data sources is a time suck! d6tjoin is a python library that lets you join pandas dataframes quickly and efficiently.

    https://www.kdnuggets.com/2020/07/fuzzy-joins-python-d6tjoin.html

  • Building a Content-Based Book Recommendation Engine

    In this blog, we will see how we can build a simple content-based recommender system using Goodreads data.

    https://www.kdnuggets.com/2020/07/building-content-based-book-recommendation-engine.html

  • Labelling Data Using Snorkel

    In this tutorial, we walk through the process of using Snorkel to generate labels for an unlabelled dataset. We will provide you examples of basic Snorkel components by guiding you through a real clinical application of Snorkel.

    https://www.kdnuggets.com/2020/07/labelling-data-using-snorkel.html

  • What I learned from looking at 200 machine learning tools

    While hundreds of machine learning tools are available today, the ML software landscape may still be underdeveloped with more room to mature. This review considers the state of ML tools, existing challenges, and which frameworks are addressing the future of machine learning software.

    https://www.kdnuggets.com/2020/07/200-machine-learning-tools.html

  • 3 Advanced Python Features You Should Know

    As a Data Scientist, you are already spending most of your time getting your data ready for prime time. Follow these real-world scenarios to learn how to leverage the advanced techniques in Python of list comprehension, Lambda expressions, and the Map function to get the job done faster.

    https://www.kdnuggets.com/2020/07/3-advanced-python-features.html

  • Clustering Uber Rideshare Data

    This blog discusses clustering the Uber ridesharing dataset, with a focus on interpretation and understanding the concepts in the real world.

    https://www.kdnuggets.com/2020/07/clustering-rideshare-data-uber.html

  • Pull and Analyze Financial Data Using a Simple Python Package

    We demonstrate a simple Python script/package to help you pull financial data (all the important metrics and ratios that you can think of) and plot them.

    https://www.kdnuggets.com/2020/07/pull-analyze-financial-data-simple-python-package.html

  • Spam Filter in Python: Naive Bayes from Scratch

    In this blog post, learn how to build a spam filter using Python and the multinomial Naive Bayes algorithm, with a goal of classifying messages with a greater than 80% accuracy.

    https://www.kdnuggets.com/2020/07/spam-filter-python-naive-bayes-scratch.html

  • Feature Engineering in SQL and Python: A Hybrid Approach">Silver BlogFeature Engineering in SQL and Python: A Hybrid Approach

    Set up your workstation, reduce workplace clutter, maintain a clean namespace, and effortlessly keep your dataset up-to-date.

    https://www.kdnuggets.com/2020/07/feature-engineering-sql-python-hybrid-approach.html

  • Getting Started with TensorFlow 2">Gold BlogGetting Started with TensorFlow 2

    Learn about the latest version of TensorFlow with this hands-on walk-through of implementing a classification problem with deep learning, how to plot it, and how to improve its results.

    https://www.kdnuggets.com/2020/07/getting-started-tensorflow2.html

  • Speed up your Numpy and Pandas with NumExpr Package">Gold BlogSpeed up your Numpy and Pandas with NumExpr Package

    We show how to significantly speed up your mathematical calculations in Numpy and Pandas using a small library.

    https://www.kdnuggets.com/2020/07/speed-up-numpy-pandas-numexpr-package.html

  • Software engineering fundamentals for Data Scientists

    As a data scientist writing code for your models, it's quite possible that your work will make its way into a production environment to be used by the masses. But, writing code that is deployed as software is much different than writing code for exploratory data analysis. Learn about the key approaches for making your code production-ready that will save you time and future headaches.

    https://www.kdnuggets.com/2020/06/software-engineering-fundamentals-data-scientists.html

  • How to Prepare Your Data

    This is an overview of structuring, cleaning, and enriching raw data.

    https://www.kdnuggets.com/2020/06/how-prepare-your-data.html

  • Practical Markov Chain Monte Carlo

    This is a slightly more intricate example of MCMC, compared to many with a fairly simple model, a single predictor (maybe two), and not much else, which highlights a couple of issues and tricks worth noting for a handwritten implementation.

    https://www.kdnuggets.com/2020/06/practical-markov-chain-monte-carlo.html

  • Build a Branded Web Based GIS Application Using R, Leaflet and Flexdashboard

    By using R, Flexdashboard and Leaflet, we can build a customized and branded web application to showcase location based data interactively across the organization. Instead of crowding the application with many widgets, we use menu tabs and pages to separate the interactive aspects.

    https://www.kdnuggets.com/2020/06/branded-web-based-gis-application-r-leaflet-flexdashboard.html

  • Machine Learning in Dask

    In this piece, we’ll see how we can use Dask to work with large datasets on our local machines.

    https://www.kdnuggets.com/2020/06/machine-learning-dask.html

  • A Classification Project in Machine Learning: a gentle step-by-step guide

    Classification is a core technique in the fields of data science and machine learning that is used to predict the categories to which data should belong. Follow this learning guide that demonstrates how to consider multiple classification models to predict data scrapped from the web.

    https://www.kdnuggets.com/2020/06/classification-project-machine-learning-guide.html

  • Simplified Mixed Feature Type Preprocessing in Scikit-Learn with Pipelines

    There is a quick and easy way to perform preprocessing on mixed feature type data in Scikit-Learn, which can be integrated into your machine learning pipelines.

    https://www.kdnuggets.com/2020/06/simplifying-mixed-feature-type-preprocessing-scikit-learn-pipelines.html

  • LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

    Spark-TFRecord enables the processing of TensorFlow’s TFRecord structures in Apache Spark.

    https://www.kdnuggets.com/2020/05/linkedin-open-sources-small-component-tensorflow-spark-interoperability.html

  • Coding habits for data scientists

    While the core machine learning algorithms might only take up a few lines of code, it's the rest of your program that can get messy fast. Learn about some techniques for identifying bad coding habits in ML that add to complexity in code as well as start new habits that can help partition complexity.

    https://www.kdnuggets.com/2020/05/coding-habits-data-scientists.html

  • Getting Started with Spectral Clustering

    This post will unravel a practical example to illustrate and motivate the intuition behind each step of the spectral clustering algorithm.

    https://www.kdnuggets.com/2020/05/getting-started-spectral-clustering.html

  • Coronavirus COVID-19 Genome Analysis using Biopython">Silver BlogCoronavirus COVID-19 Genome Analysis using Biopython

    So in this article, we will interpret, analyze the COVID-19 DNA sequence data and try to get as many insights regarding the proteins that made it up. Later will compare COVID-19 DNA with MERS and SARS and we’ll understand the relationship among them.

    https://www.kdnuggets.com/2020/04/coronavirus-covid-19-genome-analysis-biopython.html

  • LSTM for time series prediction

    Learn how to develop a LSTM neural network with PyTorch on trading data to predict future prices by mimicking actual values of the time series data.

    https://www.kdnuggets.com/2020/04/lstm-time-series-prediction.html

  • Announcing PyCaret 1.0.0

    An open source low-code machine learning library in Python. PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient.

    https://www.kdnuggets.com/2020/04/announcing-pycaret.html

  • The Benefits & Examples of Using Apache Spark with PySpark

    Apache Spark runs fast, offers robust, distributed, fault-tolerant data objects, and integrates beautifully with the world of machine learning and graph analytics. Learn more here.

    https://www.kdnuggets.com/2020/04/benefits-apache-spark-pyspark.html

  • Why and How to Use Dask with Big Data

    The Pandas library for Python is a game-changer for data preparation. But, when the data gets big, really big, then your computer needs more help to efficiency handle all that data. Learn more about how to use Dask and follow a demo to scale up your Pandas to work with Big Data.

    https://www.kdnuggets.com/2020/04/dask-big-data.html

  • Visualizing Decision Trees with Python (Scikit-learn, Graphviz, Matplotlib)

    Learn about how to visualize decision trees using matplotlib and Graphviz.

    https://www.kdnuggets.com/2020/04/visualizing-decision-trees-python.html

  • Simple Question Answering (QA) Systems That Use Text Similarity Detection in Python

    How exactly are smart algorithms able to engage and communicate with us like humans? The answer lies in Question Answering systems that are built on a foundation of Machine Learning and Natural Language Processing. Let's build one here.

    https://www.kdnuggets.com/2020/04/simple-question-answering-systems-text-similarity-python.html

  • Stop Hurting Your Pandas!">Silver BlogStop Hurting Your Pandas!

    This post will address the issues that can arise when Pandas slicing is used improperly. If you see the warning that reads "A value is trying to be set on a copy of a slice from a DataFrame", this post is for you.

    https://www.kdnuggets.com/2020/04/stop-hurting-pandas.html

  • Python for data analysis… is it really that simple?!?">Silver BlogPython for data analysis… is it really that simple?!?

    The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations.

    https://www.kdnuggets.com/2020/04/python-data-analysis-really-that-simple.html

  • Introduction to the K-nearest Neighbour Algorithm Using Examples

    Read this concise summary of KNN, a supervised and pattern classification learning algorithm which helps us find which class the new input belongs to when k nearest neighbours are chosen and distance is calculated between them.

    https://www.kdnuggets.com/2020/04/introduction-k-nearest-neighbour-algorithm-using-examples.html

  • Evaluating Ray: Distributed Python for Massive Scalability

    If your team has started using ​Ray​ and you’re wondering what it is, this post is for you. If you’re wondering if Ray should be part of your technical strategy for Python-based applications, especially ML and AI, this post is for you.

    https://www.kdnuggets.com/2020/03/domino-ray-distributed-python-massive-scalability.html

  • Python Pandas For Data Discovery in 7 Simple Steps

    Just getting started with Python's Pandas library for data analysis? Or, ready for a quick refresher? These 7 steps will help you become familiar with its core features so you can begin exploring your data in no time.

    https://www.kdnuggets.com/2020/03/python-pandas-data-discovery.html

  • Audio Data Analysis Using Deep Learning with Python (Part 2)

    This is a followup to the first article in this series. Once you are comfortable with the concepts explained in that article, you can come back and continue with this.

    https://www.kdnuggets.com/2020/02/audio-data-analysis-deep-learning-python-part-2.html

  • Using the Fitbit Web API with Python

    Fitbit provides a Web API for accessing data from Fitbit activity trackers. Check out this updated tutorial to accessing this Fitbit data using the API with Python.

    https://www.kdnuggets.com/2020/02/using-fitbit-web-api-python.html

  • Adversarial Validation Overview

    Learn how to implement adversarial validation that builds a classifier to determine if your data is from the training or testing sets. If you can do this, then your data has issues, and your adversarial validation model can help you diagnose the problem.

    https://www.kdnuggets.com/2020/02/adversarial-validation-overview.html

  • Exoplanet Hunting Using Machine Learning

    Search for exoplanets — those planets beyond our own solar system — using machine learning, and implement these searches in Python.

    https://www.kdnuggets.com/2020/01/exoplanet-hunting-machine-learning.html

  • The 5 Most Useful Techniques to Handle Imbalanced Datasets

    This post is about explaining the various techniques you can use to handle imbalanced datasets.

    https://www.kdnuggets.com/2020/01/5-most-useful-techniques-handle-imbalanced-datasets.html

  • Random Forest® — A Powerful Ensemble Learning Algorithm

    The article explains the Random Forest algorithm and how to build and optimize a Random Forest classifier.

    https://www.kdnuggets.com/2020/01/random-forest-powerful-ensemble-learning-algorithm.html

  • Explaining Black Box Models: Ensemble and Deep Learning Using LIME and SHAP

    This article will demonstrate explainability on the decisions made by LightGBM and Keras models in classifying a transaction for fraudulence, using two state of the art open source explainability techniques, LIME and SHAP.

    https://www.kdnuggets.com/2020/01/explaining-black-box-models-ensemble-deep-learning-lime-shap.html

  • Geovisualization with Open Data

    In this post I want to show how to use public available (open) data to create geo visualizations in python. Maps are a great way to communicate and compare information when working with geolocation data. There are many frameworks to plot maps, here I focus on matplotlib and geopandas (and give a glimpse of mplleaflet).

    https://www.kdnuggets.com/2020/01/open-data-germany-maps-viz.html

  • Survey Segmentation Tutorial

    Learn the basics of verifying segmentation, analyzing the data, and creating segments in this tutorial. When reviewing survey data, you will typically be handed Likert questions (e.g., on a scale of 1 to 5), and by using a few techniques, you can verify the quality of the survey and start grouping respondents into populations.

    https://www.kdnuggets.com/2020/01/survey-segmentation-tutorial.html

  • H2O Framework for Machine Learning

    This article is an overview of H2O, a scalable and fast open-source platform for machine learning. We will apply it to perform classification tasks.

    https://www.kdnuggets.com/2020/01/h2o-framework-machine-learning.html

  • Automated Machine Learning: How do teams work together on an AutoML project?">Gold BlogAutomated Machine Learning: How do teams work together on an AutoML project?

    In this use case, available to the public on GitHub, we’ll see how a data scientist, project manager, and business lead at a retail grocer can leverage automated machine learning and Azure Machine Learning service to reduce product overstock.

    https://www.kdnuggets.com/2020/01/teams-work-together-automl-project.html

  • Random Forest® vs Neural Networks for Predicting Customer Churn

    Let us see how random forest competes with neural networks for solving a real world business problem.

    https://www.kdnuggets.com/2019/12/random-forest-vs-neural-networks-predicting-customer-churn.html

  • Market Basket Analysis: A Tutorial

    This article is about Market Basket Analysis & the Apriori algorithm that works behind it.

    https://www.kdnuggets.com/2019/12/market-basket-analysis.html

  • Interpretability part 3: opening the black box with LIME and SHAP

    The third part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers methods that try to explain each prediction instead of establishing a global explanation.

    https://www.kdnuggets.com/2019/12/interpretability-part-3-lime-shap.html

  • Build Pipelines with Pandas Using pdpipe">Gold BlogBuild Pipelines with Pandas Using pdpipe

    We show how to build intuitive and useful pipelines with Pandas DataFrame using a wonderful little library called pdpipe.

    https://www.kdnuggets.com/2019/12/build-pipelines-pandas-pdpipe.html

  • Interpretability: Cracking open the black box, Part 2

    The second part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers post-hoc interpretation that is useful when the model is not transparent.

    https://www.kdnuggets.com/2019/12/interpretability-black-box-part-2.html

  • The Essential Toolbox for Data Cleaning

    Increase your confidence to perform data cleaning with a broader perspective of what datasets typically look like, and follow this toolbox of code snipets to make your data cleaning process faster and more efficient.

    https://www.kdnuggets.com/2019/12/essential-toolbox-data-cleaning.html

  • Explainability: Cracking open the black box, Part 1

    What is Explainability in AI and how can we leverage different techniques to open the black box of AI and peek inside? This practical guide offers a review and critique of the various techniques of interpretability.

    https://www.kdnuggets.com/2019/12/explainability-black-box-part1.html

  • Spark NLP 101: LightPipeline

    A Pipeline is specified as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through each stage. Now let’s see how this can be done in Spark NLP using Annotators and Transformers.

    https://www.kdnuggets.com/2019/11/spark-nlp-101-lightpipeline.html

  • Content-based Recommender Using Natural Language Processing (NLP)

    A guide to build a content-based movie recommender model based on NLP.

    https://www.kdnuggets.com/2019/11/content-based-recommender-using-natural-language-processing-nlp.html

  • Python, Selenium & Google for Geocoding Automation: Free and Paid">Gold BlogPython, Selenium & Google for Geocoding Automation: Free and Paid

    This tutorial will take you through two options that have automated the geocoding process for the user using Python, Selenium and Google Geocoding API.

    https://www.kdnuggets.com/2019/11/automate-geocoding-free-paid-python-selenium-google.html

  • Python Tuples and Tuple Methods

    Brush up on your Python basics with this post on creating, using, and manipulating tuples.

    https://www.kdnuggets.com/2019/11/python-tuples-methods.html

  • Python Lists and List Manipulation

    In Python, lists store an ordered collection of items which can be of different types. This post is an overview of lists and their manipulation.

    https://www.kdnuggets.com/2019/11/python-lists-list-manipulation.html

  • Testing Your Machine Learning Pipelines

    Let’s take a look at traditional testing methodologies and how we can apply these to our data/ML pipelines.

    https://www.kdnuggets.com/2019/11/testing-machine-learning-pipelines.html

  • Beginners Guide to the Three Types of Machine Learning

    The following article is an introduction to classification and regression — which are known as supervised learning — and unsupervised learning — which in the context of machine learning applications often refers to clustering — and will include a walkthrough in the popular python library scikit-learn.

    https://www.kdnuggets.com/2019/11/beginners-guide-three-types-machine-learning.html

  • Platinum BlogHow to Speed up Pandas by 4x with one line of code">Gold BlogPlatinum BlogHow to Speed up Pandas by 4x with one line of code

    While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep.

    https://www.kdnuggets.com/2019/11/speed-up-pandas-4x.html

  • Understanding Boxplots">Silver BlogUnderstanding Boxplots

    A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.

    https://www.kdnuggets.com/2019/11/understanding-boxplots.html

  • Orchestrating Dynamic Reports in Python and R with Rmd Files

    Do you want to extract csv files with Python and visualize them in R? How does preparing everything in R and make conclusions with Python sound? Both are possible if you know the right libraries and techniques. Here, we’ll walk through a use-case using both languages in one analysis

    https://www.kdnuggets.com/2019/11/orchestrating-dynamic-reports-python-r-rmd-files.html

  • Data Cleaning and Preprocessing for Beginners

    Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.

    https://www.kdnuggets.com/2019/11/data-cleaning-preprocessing-beginners.html

  • 5 Advanced Features of Pandas and How to Use Them

    The pandas library offers core functionality when preparing your data using Python. But, many don't go beyond the basics, so learn about these lesser-known advanced methods that will make handling your data easier and cleaner.

    https://www.kdnuggets.com/2019/10/5-advanced-features-pandas.html

  • How to Write Web Apps Using Simple Python for Data Scientists

    Convert your Data Science Projects into cool apps easily without knowing any web frameworks.

    https://www.kdnuggets.com/2019/10/write-web-apps-using-simple-python-data-scientists.html

  • Top 7 Things I Learned in my Data Science Masters

    Even though I’m still in my studies, here’s a list of the most important things I’ve learned (as of yet).

    https://www.kdnuggets.com/2019/10/top-7-things-learned-data-science-masters.html

  • Using Time Series Encodings to Discover Baseball History’s Most Interesting Seasons

    Take me out to the ballgame! Take me out to the crowd! For the 2,829 seasons that have been played for 101 baseball teams since 1880, which seasons were unlike any others? Using SAX Encoding to recognize patterns in time series data, the most special years in baseball can be found.

    https://www.kdnuggets.com/2019/09/time-series-baseball.html

  • Natural Language in Python using spaCy: An Introduction

    This article provides a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries.

    https://www.kdnuggets.com/2019/09/natural-language-python-using-spacy-introduction.html

  • A Single Function to Streamline Image Classification with Keras

    We show, step-by-step, how to construct a single, generalized, utility function to pull images automatically from a directory and train a convolutional neural net model.

    https://www.kdnuggets.com/2019/09/single-function-streamline-image-classification-keras.html

  • Scikit-Learn & More for Synthetic Dataset Generation for Machine Learning

    While mature algorithms and extensive open-source libraries are widely available for machine learning practitioners, sufficient data to apply these techniques remains a core challenge. Discover how to leverage scikit-learn and other tools to generate synthetic data appropriate for optimizing and fine-tuning your models.

    https://www.kdnuggets.com/2019/09/scikit-learn-synthetic-dataset.html

  • Applying Data Science to Cybersecurity Network Attacks & Events

    Check out this detailed tutorial on applying data science to the cybersecurity domain, written by an individual with backgrounds in both fields.

    https://www.kdnuggets.com/2019/09/applying-data-science-cybersecurity-network-attacks-events.html

  • 5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python

    “I want to learn machine learning and artificial intelligence, where do I start?” Here.

    https://www.kdnuggets.com/2019/09/5-beginner-friendly-steps-learn-machine-learning-data-science-python.html

  • The 5 Sampling Algorithms every Data Scientist need to know

    Algorithms are at the core of data science and sampling is a critical technical that can make or break a project. Learn more about the most common sampling techniques used, so you can select the best approach while working with your data.

    https://www.kdnuggets.com/2019/09/5-sampling-algorithms.html

  • Explore the world of Bioinformatics with Machine Learning">Gold BlogExplore the world of Bioinformatics with Machine Learning

    The article contains a brief introduction of Bioinformatics and how a machine learning classification algorithm can be used to classify the type of cancer in each patient by their gene expressions.

    https://www.kdnuggets.com/2019/09/explore-world-bioinformatics-machine-learning.html

Refine your search here:

No, thanks!