Search results for dataframe

Found 520 documents, 5944 searched:

Data Science Books You Should Start Reading in 2021">Data Science Books You Should Start Reading in 2021
Check out this curated list of the best data science books for any level.
https://www.kdnuggets.com/2021/04/data-science-books-start-reading-2021.html
10 Must-Know Statistical Concepts for Data Scientists
Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.
https://www.kdnuggets.com/2021/04/10-statistical-concepts-data-scientists.html
Time Series Forecasting with PyCaret Regression Module
PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use PyCaret's Regression Module for Time Series Forecasting.
https://www.kdnuggets.com/2021/04/time-series-forecasting-pycaret-regression-module.html
E-commerce Data Analysis for Sales Strategy Using Python
Check out this informative and concise case study applying data analysis using Python to a well-defined e-commerce scenario.
https://www.kdnuggets.com/2021/04/e-commerce-data-analysis-sales-strategy-python.html
Awesome Tricks And Best Practices From Kaggle">Awesome Tricks And Best Practices From Kaggle
Easily learn what is only learned by hours of search and exploration.
https://www.kdnuggets.com/2021/04/awesome-tricks-best-practices-kaggle.html
Shapash: Making Machine Learning Models Understandable">Shapash: Making Machine Learning Models Understandable
Establishing an expectation for trust around AI technologies may soon become one of the most important skills provided by Data Scientists. Significant research investments are underway in this area, and new tools are being developed, such as Shapash, an open-source Python library that helps Data Scientists make machine learning models more transparent and understandable.
https://www.kdnuggets.com/2021/04/shapash-machine-learning-models-understandable.html
Top 10 Python Libraries Data Scientists should know in 2021">Top 10 Python Libraries Data Scientists should know in 2021
So many Python libraries exist that offer powerful and efficient foundations for supporting your data science work and machine learning model development. While the list may seem overwhelming, there are certain libraries you should focus your time on, as they are some of the most commonly used today.
https://www.kdnuggets.com/2021/03/top-10-python-libraries-2021.html
The Best Machine Learning Frameworks & Extensions for Scikit-learn">The Best Machine Learning Frameworks & Extensions for Scikit-learn
Learn how to use a selection of packages to extend the functionality of Scikit-learn estimators.
https://www.kdnuggets.com/2021/03/best-machine-learning-frameworks-extensions-scikit-learn.html
Learning from machine learning mistakes
Read this article and discover how to find weak spots of a regression model.
https://www.kdnuggets.com/2021/03/learning-from-machine-learning-mistakes.html
Know your data much faster with the new Sweetviz Python library">Know your data much faster with the new Sweetviz Python library
One of the latest exploratory data analysis libraries is a new open-source Python library called Sweetviz, for just the purposes of finding out data types, missing information, distribution of values, correlations, etc. Find out more about the library and how to use it here.
https://www.kdnuggets.com/2021/03/know-your-data-much-faster-sweetviz-python-library.html
How to Speed Up Pandas with Modin
The Modin library has the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters. This tutorial goes over how to get started with Modin and how it can speed up your pandas workflows.
https://www.kdnuggets.com/2021/03/speed-up-pandas-modin.html
11 Essential Code Blocks for Complete EDA (Exploratory Data Analysis)
This article is a practical guide to exploring any data science project and gain valuable insights.
https://www.kdnuggets.com/2021/03/11-essential-code-blocks-exploratory-data-analysis.html
Dask and Pandas: No Such Thing as Too Much Data
Do you love pandas, but don't love it when you reach the limits of your memory or compute resources? Dask provides you with the option to use the pandas API with distributed data and computing. Learn how it works, how to use it, and why it’s worth the switch when you need it most.
https://www.kdnuggets.com/2021/03/dask-pandas-data.html
15 common mistakes data scientists make in Python (and how to fix them)
Writing Python code that works for your data science project and performs the task you expect is one thing. Ensuring your code is readable by others (including your future self), reproducible, and efficient are entirely different challenges that can be addressed by minimizing common bad practices in your development.
https://www.kdnuggets.com/2021/03/15-common-mistakes-python.html
Are You Still Using Pandas to Process Big Data in 2021? Here are two better options">Are You Still Using Pandas to Process Big Data in 2021? Here are two better options
When its time to handle a lot of data -- so much that you are in the realm of Big Data -- what tools can you use to wrangle the data, especially in a notebook environment? Pandas doesn’t handle really Big Data very well, but two other libraries do. So, which one is better and faster?
https://www.kdnuggets.com/2021/03/pandas-big-data-better-options.html
Data Science Learning Roadmap for 2021">Data Science Learning Roadmap for 2021
Venturing into the world of Data Science is an exciting, interesting, and rewarding path to consider. There is a great deal to master, and this self-learning recommendation plan will guide you toward establishing a solid understanding of all that is foundational to data science as well as a solid portfolio to showcase your developed expertise.
https://www.kdnuggets.com/2021/02/data-science-learning-roadmap-2021.html
Pandas Profiling: One-Line Magical Code for EDA
EDA can be automated using a Python library called Pandas Profiling. Let’s explore Pandas profiling to do EDA in a very short time and with just a single line code.
https://www.kdnuggets.com/2021/02/pandas-profiling-one-line-magical-code-eda.html
Powerful Exploratory Data Analysis in just two lines of code">Powerful Exploratory Data Analysis in just two lines of code
EDA is a fundamental early process for any Data Science investigation. Typical approaches for visualization and exploration are powerful, but can be cumbersome for getting to the heart of your data. Now, you can get to know your data much faster with only a few lines of code... and it might even be fun!
https://www.kdnuggets.com/2021/02/powerful-exploratory-data-analysis-sweetviz.html
7 Most Recommended Skills to Learn to be a Data Scientist
The Data Scientist professional has emerged as a true interdisciplinary role that spans a variety of skills, theoretical and practical. For the core, day-to-day activities, many critical requirements that enable the delivery of real business value reach well outside the realm of machine learning, and should be mastered by those aspiring to the field.
https://www.kdnuggets.com/2021/02/7-most-recommended-skills-data-scientist.html
The Ultimate Scikit-Learn Machine Learning Cheatsheet">The Ultimate Scikit-Learn Machine Learning Cheatsheet
With the power and popularity of the scikit-learn for machine learning in Python, this library is a foundation to any practitioner's toolset. Preview its core methods with this review of predictive modelling, clustering, dimensionality reduction, feature importance, and data transformation.
https://www.kdnuggets.com/2021/01/ultimate-scikit-learn-machine-learning-cheatsheet.html
Can Data Science Be Agile? Implementing Best Agile Practices to Your Data Science Process
Agile is not reserved for software developers only -- that's a myth. While these effective strategies are not commonly used by data scientists today and some aspects of data science make Agile a bit tricky, the methodology offers plenty of benefits to data science projects that can increase the effectiveness of your process and bring more success to your outcomes.
https://www.kdnuggets.com/2021/01/data-science-agile-best-practices.html
Cleaner Data Analysis with Pandas Using Pipes">Cleaner Data Analysis with Pandas Using Pipes
Check out this practical guide on Pandas pipes.
https://www.kdnuggets.com/2021/01/cleaner-data-analysis-pandas-pipes.html
Model Experiments, Tracking and Registration using MLflow on Databricks
This post covers how StreamSets can help expedite operations at some of the most crucial stages of Machine Learning Lifecycle and MLOps, and demonstrates integration with Databricks and MLflow.
https://www.kdnuggets.com/2021/01/model-experiments-tracking-registration-mlflow-databricks.html
Meet whale! The stupidly simple data discovery tool">Meet whale! The stupidly simple data discovery tool
Finding data and understanding its meaning represents the traditional "daily grind" of a Data Scientist. With whale, the new lightweight data discovery, documentation, and quality engine for your data warehouse that is under development by Dataframe, your data science team will more efficiently search data and automate its data metrics.
https://www.kdnuggets.com/2020/12/whale-data-discovery-tool.html
How to easily check if your Machine Learning model is fair?
Machine learning models deployed today -- as will many more in the future -- impact people and society directly. With that power and influence resting in the hands of Data Scientists and machine learning engineers, taking the time to evaluate and understand if model results are fair will become the linchpin for the future success of AI/ML solutions. These are critical considerations, and using a recently developed fairness module in the dalex Python package is a unified and accessible way to ensure your models remain fair.
https://www.kdnuggets.com/2020/12/machine-learning-model-fair.html
A Rising Library Beating Pandas in Performance">A Rising Library Beating Pandas in Performance
This article compares the performance of the well-known pandas library with pypolars, a rising DataFrame library written in Rust. See how they compare.
https://www.kdnuggets.com/2020/12/rising-library-beating-pandas-performance.html
10 Python Skills They Don’t Teach in Bootcamp
Ascend to new heights in Data Science and Machine Learning with this thrilling list of coding tips.
https://www.kdnuggets.com/2020/12/10-python-skills-dont-teach-bootcamp.html
Data Compression via Dimensionality Reduction: 3 Main Methods
Lift the curse of dimensionality by mastering the application of three important techniques that will help you reduce the dimensionality of your data, even if it is not linearly separable.
https://www.kdnuggets.com/2020/12/data-compression-dimensionality-reduction.html
R or Python? Why Not Both?">R or Python? Why Not Both?
Do you use both R and Python, either in different projects or in the same? Check out prython, an IDE designed to handle your needs.
https://www.kdnuggets.com/2020/12/r-python-both-prython.html
Top November Stories: Top Python Libraries for Data Science, Data Visualization & Machine Learning; The Best Data Science Certification You’ve Never Heard Of
Also: TabPy: Combining Python and Tableau; How to Acquire the Most Wanted Data Science Skills.
https://www.kdnuggets.com/2020/12/top-stories-2020-nov.html
10 Python Skills for Beginners
Python is the fastest growing, most-beloved programming language. Get started with these Data Science tips.
https://www.kdnuggets.com/2020/12/10-python-skills-beginners.html
Simple & Intuitive Ensemble Learning in R
Read about metaEnsembleR, an R package for heterogeneous ensemble meta-learning (classification and regression) that is fully-automated.
https://www.kdnuggets.com/2020/12/simple-intuitive-meta-learning-r.html
How to Incorporate Tabular Data with HuggingFace Transformers
In real-world scenarios, we often encounter data that includes text and tabular features. Leveraging the latest advances for transformers, effectively handling situations with both data structures can increase performance in your models.
https://www.kdnuggets.com/2020/11/tabular-data-huggingface-transformers.html
5 Things You Are Doing Wrong in PyCaret
PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. Find out 5 ways to improve your usage of the library.
https://www.kdnuggets.com/2020/11/5-things-doing-wrong-pycaret.html
Mastering TensorFlow Tensors in 5 Easy Steps
Discover how the building blocks of TensorFlow works at the lower level and learn how to make the most of Tensor objects.
https://www.kdnuggets.com/2020/11/mastering-tensorflow-tensors-5-easy-steps.html
How to Build a Football Dataset with Web Scraping
This article covers using Selenium to scrape JavaScript rendered content.
https://www.kdnuggets.com/2020/11/build-football-dataset-web-scraping.html
10 Underrated Python Skills
Tips for feature analysis, hyperparameter tuning, data visualization and more.
https://www.kdnuggets.com/2020/10/10-underrated-python-skills.html
Data Science in the Cloud with Dask
Scaling large data analyses for data science and machine learning is growing in importance. Dask and Coiled are making it easy and fast for folks to do just that. Read on to find out how.
https://www.kdnuggets.com/2020/10/data-science-cloud-dask.html
Feature Ranking with Recursive Feature Elimination in Scikit-Learn
This article covers using scikit-learn to obtain the optimal number of features for your machine learning project.
https://www.kdnuggets.com/2020/10/feature-ranking-recursive-feature-elimination-scikit-learn.html
fastcore: An Underrated Python Library">fastcore: An Underrated Python Library
A unique python library that extends the python programming language and provides utilities that enhance productivity.
https://www.kdnuggets.com/2020/10/fastcore-underrated-python-library.html
Getting Started with PyTorch
A practical walkthrough on how to use PyTorch for data analysis and inference.
https://www.kdnuggets.com/2020/10/getting-started-pytorch.html
Uber Open Sources the Third Release of Ludwig, its Code-Free Machine Learning Platform
The new release makes Ludwig one of the most complete open source AutoML stacks in the market.
https://www.kdnuggets.com/2020/10/uber-open-source-ludwig-code-free-machine-learning-platform.html
Your Guide to Linear Regression Models
This article explains linear regression and how to program linear regression models in Python.
https://www.kdnuggets.com/2020/10/guide-linear-regression-models.html
Geographical Plots with Python">Geographical Plots with Python
When your data includes geographical information, rich map visualizations can offer significant value for you to understand your data and for the end user when interpreting analytical results.
https://www.kdnuggets.com/2020/09/geographical-plots-python.html
Introduction to Time Series Analysis in Python">Introduction to Time Series Analysis in Python
Data that is updated in real-time requires additional handling and special care to prepare it for machine learning models. The important Python library, Pandas, can be used for most of this work, and this tutorial guides you through this process for analyzing time-series data.
https://www.kdnuggets.com/2020/09/introduction-time-series-analysis-python.html
Statistical and Visual Exploratory Data Analysis with One Line of Code
If EDA is not executed correctly, it can cause us to start modeling with “unclean” data. See how to use Pandas Profiling to perform EDA with a single line of code.
https://www.kdnuggets.com/2020/09/statistical-visual-exploratory-data-analysis-one-line-code.html
Visualization Of COVID-19 New Cases Over Time In Python
Inspired by another concise data visualization, the author of this article has crafted and shared the code for a heatmap which visualizes the COVID-19 pandemic in the United States over time.
https://www.kdnuggets.com/2020/09/visualization-covid-19-new-cases-over-time-python.html
Feature Engineering for Numerical Data
Data feeds machine learning models, and the more the better, right? Well, sometimes numerical data isn't quite right for ingestion, so a variety of methods, detailed in this article, are available to transform raw numbers into something a bit more palatable.
https://www.kdnuggets.com/2020/09/feature-engineering-numerical-data.html
An Introduction to NLP and 5 Tips for Raising Your Game
This article is a collection of things the author would like to have known when they started out in NLP. Perhaps it will be useful for you.
https://www.kdnuggets.com/2020/09/introduction-nlp-5-tips-raising-your-game.html
Microsoft’s DoWhy is a Cool Framework for Causal Inference
Inspired by Judea Pearl’s do-calculus for causal inference, the open source framework provides a programmatic interface for popular causal inference methods.
https://www.kdnuggets.com/2020/08/microsoft-dowhy-framework-causal-inference.html
Explainable and Reproducible Machine Learning Model Development with DALEX and Neptune
With ML models serving real people, misclassified cases (which are a natural consequence of using ML) are affecting peoples’ lives and sometimes treating them very unfairly. It makes the ability to explain your models’ predictions a requirement rather than just a nice to have.
https://www.kdnuggets.com/2020/08/explainable-reproducible-machine-learning-model-development-dalex-neptune.html
Working with Spark, Python or SQL on Azure Databricks
Here we look at some ways to interchangeably work with Python, PySpark and SQL using Azure Databricks, an Apache Spark-based big data analytics service designed for data science and data engineering offered by Microsoft.
https://www.kdnuggets.com/2020/08/spark-python-sql-azure-databricks.html
Getting Started with Feature Selection
For machine learning, more data is always better. What about more features of data? Not necessarily. This beginners' guide with code examples for selecting the most useful features from your data will jump start you toward developing the most effective and efficient learning models.
https://www.kdnuggets.com/2020/08/getting-started-feature-selection.html
Build Your Own AutoML Using PyCaret 2.0
In this post we present a step-by-step tutorial on how PyCaret can be used to build an Automated Machine Learning Solution within Power BI, thus allowing data scientists and analysts to add a layer of machine learning to their Dashboards without any additional license or software costs.
https://www.kdnuggets.com/2020/08/build-automl-pycaret.html
GitHub is the Best AutoML You Will Ever Need
This article uses PyCaret 2.0, an open source, low-code machine learning library in Python to develop a simple AutoML solution and deploy it as a Docker container using GitHub actions.
https://www.kdnuggets.com/2020/08/github-best-automl-ever-need.html
Data Science Internship Interview Questions
Data science is an attractive field because not only is it lucrative, but you can have opportunities to work on interesting projects, and you’re always learning new things. If you're trying to get started from the ground up, then review this guide to prepare for the interview essentials.
https://www.kdnuggets.com/2020/08/data-science-internship-interview-questions.html
Word Embedding Fairness Evaluation
With word embeddings being such a crucial component of NLP, the reported social biases resulting from the training corpora could limit their application. The framework introduced here intends to measure the fairness in word embeddings to better understand these potential biases.
https://www.kdnuggets.com/2020/08/word-embedding-fairness-evaluation.html
Fuzzy Joins in Python with d6tjoin
Combining different data sources is a time suck! d6tjoin is a python library that lets you join pandas dataframes quickly and efficiently.
https://www.kdnuggets.com/2020/07/fuzzy-joins-python-d6tjoin.html
Building a Content-Based Book Recommendation Engine
In this blog, we will see how we can build a simple content-based recommender system using Goodreads data.
https://www.kdnuggets.com/2020/07/building-content-based-book-recommendation-engine.html
Labelling Data Using Snorkel
In this tutorial, we walk through the process of using Snorkel to generate labels for an unlabelled dataset. We will provide you examples of basic Snorkel components by guiding you through a real clinical application of Snorkel.
https://www.kdnuggets.com/2020/07/labelling-data-using-snorkel.html
What I learned from looking at 200 machine learning tools
While hundreds of machine learning tools are available today, the ML software landscape may still be underdeveloped with more room to mature. This review considers the state of ML tools, existing challenges, and which frameworks are addressing the future of machine learning software.
https://www.kdnuggets.com/2020/07/200-machine-learning-tools.html
3 Advanced Python Features You Should Know
As a Data Scientist, you are already spending most of your time getting your data ready for prime time. Follow these real-world scenarios to learn how to leverage the advanced techniques in Python of list comprehension, Lambda expressions, and the Map function to get the job done faster.
https://www.kdnuggets.com/2020/07/3-advanced-python-features.html
Clustering Uber Rideshare Data
This blog discusses clustering the Uber ridesharing dataset, with a focus on interpretation and understanding the concepts in the real world.
https://www.kdnuggets.com/2020/07/clustering-rideshare-data-uber.html
Pull and Analyze Financial Data Using a Simple Python Package
We demonstrate a simple Python script/package to help you pull financial data (all the important metrics and ratios that you can think of) and plot them.
https://www.kdnuggets.com/2020/07/pull-analyze-financial-data-simple-python-package.html
Spam Filter in Python: Naive Bayes from Scratch
In this blog post, learn how to build a spam filter using Python and the multinomial Naive Bayes algorithm, with a goal of classifying messages with a greater than 80% accuracy.
https://www.kdnuggets.com/2020/07/spam-filter-python-naive-bayes-scratch.html
Feature Engineering in SQL and Python: A Hybrid Approach">Feature Engineering in SQL and Python: A Hybrid Approach
Set up your workstation, reduce workplace clutter, maintain a clean namespace, and effortlessly keep your dataset up-to-date.
https://www.kdnuggets.com/2020/07/feature-engineering-sql-python-hybrid-approach.html
Getting Started with TensorFlow 2">Getting Started with TensorFlow 2
Learn about the latest version of TensorFlow with this hands-on walk-through of implementing a classification problem with deep learning, how to plot it, and how to improve its results.
https://www.kdnuggets.com/2020/07/getting-started-tensorflow2.html
Speed up your Numpy and Pandas with NumExpr Package">Speed up your Numpy and Pandas with NumExpr Package
We show how to significantly speed up your mathematical calculations in Numpy and Pandas using a small library.
https://www.kdnuggets.com/2020/07/speed-up-numpy-pandas-numexpr-package.html
Software engineering fundamentals for Data Scientists
As a data scientist writing code for your models, it's quite possible that your work will make its way into a production environment to be used by the masses. But, writing code that is deployed as software is much different than writing code for exploratory data analysis. Learn about the key approaches for making your code production-ready that will save you time and future headaches.
https://www.kdnuggets.com/2020/06/software-engineering-fundamentals-data-scientists.html
How to Prepare Your Data
This is an overview of structuring, cleaning, and enriching raw data.
https://www.kdnuggets.com/2020/06/how-prepare-your-data.html
Practical Markov Chain Monte Carlo
This is a slightly more intricate example of MCMC, compared to many with a fairly simple model, a single predictor (maybe two), and not much else, which highlights a couple of issues and tricks worth noting for a handwritten implementation.
https://www.kdnuggets.com/2020/06/practical-markov-chain-monte-carlo.html
Build a Branded Web Based GIS Application Using R, Leaflet and Flexdashboard
By using R, Flexdashboard and Leaflet, we can build a customized and branded web application to showcase location based data interactively across the organization. Instead of crowding the application with many widgets, we use menu tabs and pages to separate the interactive aspects.
https://www.kdnuggets.com/2020/06/branded-web-based-gis-application-r-leaflet-flexdashboard.html
Machine Learning in Dask
In this piece, we’ll see how we can use Dask to work with large datasets on our local machines.
https://www.kdnuggets.com/2020/06/machine-learning-dask.html
A Classification Project in Machine Learning: a gentle step-by-step guide
Classification is a core technique in the fields of data science and machine learning that is used to predict the categories to which data should belong. Follow this learning guide that demonstrates how to consider multiple classification models to predict data scrapped from the web.
https://www.kdnuggets.com/2020/06/classification-project-machine-learning-guide.html
Simplified Mixed Feature Type Preprocessing in Scikit-Learn with Pipelines
There is a quick and easy way to perform preprocessing on mixed feature type data in Scikit-Learn, which can be integrated into your machine learning pipelines.
https://www.kdnuggets.com/2020/06/simplifying-mixed-feature-type-preprocessing-scikit-learn-pipelines.html
LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability
Spark-TFRecord enables the processing of TensorFlow’s TFRecord structures in Apache Spark.
https://www.kdnuggets.com/2020/05/linkedin-open-sources-small-component-tensorflow-spark-interoperability.html
Coding habits for data scientists
While the core machine learning algorithms might only take up a few lines of code, it's the rest of your program that can get messy fast. Learn about some techniques for identifying bad coding habits in ML that add to complexity in code as well as start new habits that can help partition complexity.
https://www.kdnuggets.com/2020/05/coding-habits-data-scientists.html
Getting Started with Spectral Clustering
This post will unravel a practical example to illustrate and motivate the intuition behind each step of the spectral clustering algorithm.
https://www.kdnuggets.com/2020/05/getting-started-spectral-clustering.html
Coronavirus COVID-19 Genome Analysis using Biopython">Coronavirus COVID-19 Genome Analysis using Biopython
So in this article, we will interpret, analyze the COVID-19 DNA sequence data and try to get as many insights regarding the proteins that made it up. Later will compare COVID-19 DNA with MERS and SARS and we’ll understand the relationship among them.
https://www.kdnuggets.com/2020/04/coronavirus-covid-19-genome-analysis-biopython.html
LSTM for time series prediction
Learn how to develop a LSTM neural network with PyTorch on trading data to predict future prices by mimicking actual values of the time series data.
https://www.kdnuggets.com/2020/04/lstm-time-series-prediction.html
Announcing PyCaret 1.0.0
An open source low-code machine learning library in Python. PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient.
https://www.kdnuggets.com/2020/04/announcing-pycaret.html
The Benefits & Examples of Using Apache Spark with PySpark
Apache Spark runs fast, offers robust, distributed, fault-tolerant data objects, and integrates beautifully with the world of machine learning and graph analytics. Learn more here.
https://www.kdnuggets.com/2020/04/benefits-apache-spark-pyspark.html
Why and How to Use Dask with Big Data
The Pandas library for Python is a game-changer for data preparation. But, when the data gets big, really big, then your computer needs more help to efficiency handle all that data. Learn more about how to use Dask and follow a demo to scale up your Pandas to work with Big Data.
https://www.kdnuggets.com/2020/04/dask-big-data.html
Visualizing Decision Trees with Python (Scikit-learn, Graphviz, Matplotlib)
Learn about how to visualize decision trees using matplotlib and Graphviz.
https://www.kdnuggets.com/2020/04/visualizing-decision-trees-python.html
Simple Question Answering (QA) Systems That Use Text Similarity Detection in Python
How exactly are smart algorithms able to engage and communicate with us like humans? The answer lies in Question Answering systems that are built on a foundation of Machine Learning and Natural Language Processing. Let's build one here.
https://www.kdnuggets.com/2020/04/simple-question-answering-systems-text-similarity-python.html
Stop Hurting Your Pandas!">Stop Hurting Your Pandas!
This post will address the issues that can arise when Pandas slicing is used improperly. If you see the warning that reads "A value is trying to be set on a copy of a slice from a DataFrame", this post is for you.
https://www.kdnuggets.com/2020/04/stop-hurting-pandas.html
Python for data analysis… is it really that simple?!?">Python for data analysis… is it really that simple?!?
The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations.
https://www.kdnuggets.com/2020/04/python-data-analysis-really-that-simple.html
Introduction to the K-nearest Neighbour Algorithm Using Examples
Read this concise summary of KNN, a supervised and pattern classification learning algorithm which helps us find which class the new input belongs to when k nearest neighbours are chosen and distance is calculated between them.
https://www.kdnuggets.com/2020/04/introduction-k-nearest-neighbour-algorithm-using-examples.html
Evaluating Ray: Distributed Python for Massive Scalability
If your team has started using Ray and you’re wondering what it is, this post is for you. If you’re wondering if Ray should be part of your technical strategy for Python-based applications, especially ML and AI, this post is for you.
https://www.kdnuggets.com/2020/03/domino-ray-distributed-python-massive-scalability.html
Python Pandas For Data Discovery in 7 Simple Steps
Just getting started with Python's Pandas library for data analysis? Or, ready for a quick refresher? These 7 steps will help you become familiar with its core features so you can begin exploring your data in no time.
https://www.kdnuggets.com/2020/03/python-pandas-data-discovery.html
Audio Data Analysis Using Deep Learning with Python (Part 2)
This is a followup to the first article in this series. Once you are comfortable with the concepts explained in that article, you can come back and continue with this.
https://www.kdnuggets.com/2020/02/audio-data-analysis-deep-learning-python-part-2.html
Using the Fitbit Web API with Python
Fitbit provides a Web API for accessing data from Fitbit activity trackers. Check out this updated tutorial to accessing this Fitbit data using the API with Python.
https://www.kdnuggets.com/2020/02/using-fitbit-web-api-python.html
Adversarial Validation Overview
Learn how to implement adversarial validation that builds a classifier to determine if your data is from the training or testing sets. If you can do this, then your data has issues, and your adversarial validation model can help you diagnose the problem.
https://www.kdnuggets.com/2020/02/adversarial-validation-overview.html
Exoplanet Hunting Using Machine Learning
Search for exoplanets — those planets beyond our own solar system — using machine learning, and implement these searches in Python.
https://www.kdnuggets.com/2020/01/exoplanet-hunting-machine-learning.html
The 5 Most Useful Techniques to Handle Imbalanced Datasets
This post is about explaining the various techniques you can use to handle imbalanced datasets.
https://www.kdnuggets.com/2020/01/5-most-useful-techniques-handle-imbalanced-datasets.html
Random Forest® — A Powerful Ensemble Learning Algorithm
The article explains the Random Forest algorithm and how to build and optimize a Random Forest classifier.
https://www.kdnuggets.com/2020/01/random-forest-powerful-ensemble-learning-algorithm.html
Explaining Black Box Models: Ensemble and Deep Learning Using LIME and SHAP
This article will demonstrate explainability on the decisions made by LightGBM and Keras models in classifying a transaction for fraudulence, using two state of the art open source explainability techniques, LIME and SHAP.
https://www.kdnuggets.com/2020/01/explaining-black-box-models-ensemble-deep-learning-lime-shap.html
Geovisualization with Open Data
In this post I want to show how to use public available (open) data to create geo visualizations in python. Maps are a great way to communicate and compare information when working with geolocation data. There are many frameworks to plot maps, here I focus on matplotlib and geopandas (and give a glimpse of mplleaflet).
https://www.kdnuggets.com/2020/01/open-data-germany-maps-viz.html
Survey Segmentation Tutorial
Learn the basics of verifying segmentation, analyzing the data, and creating segments in this tutorial. When reviewing survey data, you will typically be handed Likert questions (e.g., on a scale of 1 to 5), and by using a few techniques, you can verify the quality of the survey and start grouping respondents into populations.
https://www.kdnuggets.com/2020/01/survey-segmentation-tutorial.html
H2O Framework for Machine Learning
This article is an overview of H2O, a scalable and fast open-source platform for machine learning. We will apply it to perform classification tasks.
https://www.kdnuggets.com/2020/01/h2o-framework-machine-learning.html
Automated Machine Learning: How do teams work together on an AutoML project?">Automated Machine Learning: How do teams work together on an AutoML project?
In this use case, available to the public on GitHub, we’ll see how a data scientist, project manager, and business lead at a retail grocer can leverage automated machine learning and Azure Machine Learning service to reduce product overstock.
https://www.kdnuggets.com/2020/01/teams-work-together-automl-project.html
Random Forest® vs Neural Networks for Predicting Customer Churn
Let us see how random forest competes with neural networks for solving a real world business problem.
https://www.kdnuggets.com/2019/12/random-forest-vs-neural-networks-predicting-customer-churn.html
Market Basket Analysis: A Tutorial
This article is about Market Basket Analysis & the Apriori algorithm that works behind it.
https://www.kdnuggets.com/2019/12/market-basket-analysis.html
Interpretability part 3: opening the black box with LIME and SHAP
The third part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers methods that try to explain each prediction instead of establishing a global explanation.
https://www.kdnuggets.com/2019/12/interpretability-part-3-lime-shap.html
Build Pipelines with Pandas Using pdpipe">Build Pipelines with Pandas Using pdpipe
We show how to build intuitive and useful pipelines with Pandas DataFrame using a wonderful little library called pdpipe.
https://www.kdnuggets.com/2019/12/build-pipelines-pandas-pdpipe.html
Interpretability: Cracking open the black box, Part 2
The second part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers post-hoc interpretation that is useful when the model is not transparent.
https://www.kdnuggets.com/2019/12/interpretability-black-box-part-2.html
The Essential Toolbox for Data Cleaning
Increase your confidence to perform data cleaning with a broader perspective of what datasets typically look like, and follow this toolbox of code snipets to make your data cleaning process faster and more efficient.
https://www.kdnuggets.com/2019/12/essential-toolbox-data-cleaning.html
Explainability: Cracking open the black box, Part 1
What is Explainability in AI and how can we leverage different techniques to open the black box of AI and peek inside? This practical guide offers a review and critique of the various techniques of interpretability.
https://www.kdnuggets.com/2019/12/explainability-black-box-part1.html
Spark NLP 101: LightPipeline
A Pipeline is specified as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through each stage. Now let’s see how this can be done in Spark NLP using Annotators and Transformers.
https://www.kdnuggets.com/2019/11/spark-nlp-101-lightpipeline.html
Content-based Recommender Using Natural Language Processing (NLP)
A guide to build a content-based movie recommender model based on NLP.
https://www.kdnuggets.com/2019/11/content-based-recommender-using-natural-language-processing-nlp.html
Python, Selenium & Google for Geocoding Automation: Free and Paid">Python, Selenium & Google for Geocoding Automation: Free and Paid
This tutorial will take you through two options that have automated the geocoding process for the user using Python, Selenium and Google Geocoding API.
https://www.kdnuggets.com/2019/11/automate-geocoding-free-paid-python-selenium-google.html
Python Tuples and Tuple Methods
Brush up on your Python basics with this post on creating, using, and manipulating tuples.
https://www.kdnuggets.com/2019/11/python-tuples-methods.html
Python Lists and List Manipulation
In Python, lists store an ordered collection of items which can be of different types. This post is an overview of lists and their manipulation.
https://www.kdnuggets.com/2019/11/python-lists-list-manipulation.html
Testing Your Machine Learning Pipelines
Let’s take a look at traditional testing methodologies and how we can apply these to our data/ML pipelines.
https://www.kdnuggets.com/2019/11/testing-machine-learning-pipelines.html
Beginners Guide to the Three Types of Machine Learning
The following article is an introduction to classification and regression — which are known as supervised learning — and unsupervised learning — which in the context of machine learning applications often refers to clustering — and will include a walkthrough in the popular python library scikit-learn.
https://www.kdnuggets.com/2019/11/beginners-guide-three-types-machine-learning.html
How to Speed up Pandas by 4x with one line of code">How to Speed up Pandas by 4x with one line of code
While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep.
https://www.kdnuggets.com/2019/11/speed-up-pandas-4x.html
Understanding Boxplots">Understanding Boxplots
A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.
https://www.kdnuggets.com/2019/11/understanding-boxplots.html
Orchestrating Dynamic Reports in Python and R with Rmd Files
Do you want to extract csv files with Python and visualize them in R? How does preparing everything in R and make conclusions with Python sound? Both are possible if you know the right libraries and techniques. Here, we’ll walk through a use-case using both languages in one analysis
https://www.kdnuggets.com/2019/11/orchestrating-dynamic-reports-python-r-rmd-files.html
Data Cleaning and Preprocessing for Beginners
Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.
https://www.kdnuggets.com/2019/11/data-cleaning-preprocessing-beginners.html
5 Advanced Features of Pandas and How to Use Them
The pandas library offers core functionality when preparing your data using Python. But, many don't go beyond the basics, so learn about these lesser-known advanced methods that will make handling your data easier and cleaner.
https://www.kdnuggets.com/2019/10/5-advanced-features-pandas.html
How to Write Web Apps Using Simple Python for Data Scientists
Convert your Data Science Projects into cool apps easily without knowing any web frameworks.
https://www.kdnuggets.com/2019/10/write-web-apps-using-simple-python-data-scientists.html
Top 7 Things I Learned in my Data Science Masters
Even though I’m still in my studies, here’s a list of the most important things I’ve learned (as of yet).
https://www.kdnuggets.com/2019/10/top-7-things-learned-data-science-masters.html
Using Time Series Encodings to Discover Baseball History’s Most Interesting Seasons
Take me out to the ballgame! Take me out to the crowd! For the 2,829 seasons that have been played for 101 baseball teams since 1880, which seasons were unlike any others? Using SAX Encoding to recognize patterns in time series data, the most special years in baseball can be found.
https://www.kdnuggets.com/2019/09/time-series-baseball.html
Natural Language in Python using spaCy: An Introduction
This article provides a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries.
https://www.kdnuggets.com/2019/09/natural-language-python-using-spacy-introduction.html
A Single Function to Streamline Image Classification with Keras
We show, step-by-step, how to construct a single, generalized, utility function to pull images automatically from a directory and train a convolutional neural net model.
https://www.kdnuggets.com/2019/09/single-function-streamline-image-classification-keras.html
Scikit-Learn & More for Synthetic Dataset Generation for Machine Learning
While mature algorithms and extensive open-source libraries are widely available for machine learning practitioners, sufficient data to apply these techniques remains a core challenge. Discover how to leverage scikit-learn and other tools to generate synthetic data appropriate for optimizing and fine-tuning your models.
https://www.kdnuggets.com/2019/09/scikit-learn-synthetic-dataset.html
Applying Data Science to Cybersecurity Network Attacks & Events
Check out this detailed tutorial on applying data science to the cybersecurity domain, written by an individual with backgrounds in both fields.
https://www.kdnuggets.com/2019/09/applying-data-science-cybersecurity-network-attacks-events.html
5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python
“I want to learn machine learning and artificial intelligence, where do I start?” Here.
https://www.kdnuggets.com/2019/09/5-beginner-friendly-steps-learn-machine-learning-data-science-python.html
The 5 Sampling Algorithms every Data Scientist need to know
Algorithms are at the core of data science and sampling is a critical technical that can make or break a project. Learn more about the most common sampling techniques used, so you can select the best approach while working with your data.
https://www.kdnuggets.com/2019/09/5-sampling-algorithms.html
Explore the world of Bioinformatics with Machine Learning">Explore the world of Bioinformatics with Machine Learning
The article contains a brief introduction of Bioinformatics and how a machine learning classification algorithm can be used to classify the type of cancer in each patient by their gene expressions.
https://www.kdnuggets.com/2019/09/explore-world-bioinformatics-machine-learning.html

More...1 234 >

Search results for dataframe

Top Posts