- Python for data analysis… is it really that simple?!? - Apr 2, 2020.
The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations.
- Introduction to the K-nearest Neighbour Algorithm Using Examples - Apr 1, 2020.
Read this concise summary of KNN, a supervised and pattern classification learning algorithm which helps us find which class the new input belongs to when k nearest neighbours are chosen and distance is calculated between them.
- KDnuggets™ News 20:n13, Apr 1: Effective visualizations for pandemic storytelling; Machine learning for time series forecasting - Apr 1, 2020.
This week, read about the power of effective visualizations for pandemic storytelling; see how (not) to use machine learning for time series forecasting; learn about a deep learning breakthrough: a sub-linear deep learning algorithm that does not need a GPU?; familiarize yourself with how to painlessly analyze your time series; check out what can we learn from the latest coronavirus trends; and... KDnuggets topics?!? Also, much more.
- How To Painlessly Analyze Your Time Series - Mar 26, 2020.
The Matrix Profile is a powerful tool to help solve this dual problem of anomaly detection and motif discovery. Matrix Profile is robust, scalable, and largely parameter-free: we’ve seen it work for a wide range of metrics including website user data, order volume and other business-critical applications.
- Evaluating Ray: Distributed Python for Massive Scalability - Mar 25, 2020.
If your team has started using Ray and you’re wondering what it is, this post is for you. If you’re wondering if Ray should be part of your technical strategy for Python-based applications, especially ML and AI, this post is for you.
- Build an Artificial Neural Network From Scratch: Part 2 - Mar 20, 2020.
The second article in this series focuses on building an Artificial Neural Network using the Numpy Python library.
- The 4 Best Jupyter Notebook Environments for Deep Learning - Mar 19, 2020.
Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.
- Exploring the Adoption of Python in the Workplace – Free Metis Corporate Training Webinar - Mar 18, 2020.
Metis will break down Python for data science and analytics, explain what is driving adoption in the field, and discuss how industries and companies are reacting to the shift.
- Five Interesting Data Engineering Projects - Mar 17, 2020.
As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.
- Python Pandas For Data Discovery in 7 Simple Steps - Mar 10, 2020.
Just getting started with Python's Pandas library for data analysis? Or, ready for a quick refresher? These 7 steps will help you become familiar with its core features so you can begin exploring your data in no time.
- Generate Realistic Human Face using GAN - Mar 10, 2020.
This article contain a brief intro to Generative Adversarial Network(GAN) and how to build a Human Face Generator.
- Tokenization and Text Data Preparation with TensorFlow & Keras - Mar 6, 2020.
This article will look at tokenizing and further preparing text data for feeding into a neural network using TensorFlow and Keras preprocessing tools.
- TensorFlow 2.0 Tutorial: Optimizing Training Time Performance - Mar 5, 2020.
Tricks to improve TensorFlow training time with tf.data pipeline optimizations, mixed precision training and multi-GPU strategies.
- Recreating Fingerprints using Convolutional Autoencoders - Mar 4, 2020.
The article gets you started working with fingerprints using Deep Learning.
- KDnuggets™ News 20:n09, Mar 4: When Will AutoML replace Data Scientists (if ever) – vote; 20 AI, DS, ML Terms You Need to Know (part 2) - Mar 4, 2020.
- 5 Google Colaboratory Tips - Mar 2, 2020.
Are you looking for some tips for using Google Colab for your projects? This article presents five you may find useful.
- Hands on Hyperparameter Tuning with Keras Tuner - Feb 28, 2020.
Or how hyperparameter tuning with Keras Tuner can boost your object classification network's accuracy by 10%.
- Python and R Courses for Data Science - Feb 26, 2020.
Since Python and R are a must for today's data scientists, continuous learning is paramount. Online courses are arguably the best and most flexible way to upskill throughout ones career.
- Audio Data Analysis Using Deep Learning with Python (Part 2) - Feb 25, 2020.
This is a followup to the first article in this series. Once you are comfortable with the concepts explained in that article, you can come back and continue with this.
- The Forgotten Algorithm - Feb 20, 2020.
This article explores Monte Carlo Simulation with Streamlit.
- Audio Data Analysis Using Deep Learning with Python (Part 1) - Feb 19, 2020.
A brief introduction to audio data processing and genre classification using Neural Networks and python.
- KDnuggets™ News 20:n07, Feb 19: 20 AI, Data Science, Machine Learning Terms for 2020; Why Did I Reject a Data Scientist Job? - Feb 19, 2020.
This week on KDnuggets: 20 AI, Data Science, Machine Learning Terms You Need to Know in 2020; Why Did I Reject a Data Scientist Job?; Fourier Transformation for a Data Scientist; Math for Programmers; Deep Neural Networks; Practical Hyperparameter Optimization; and much more!
- Using the Fitbit Web API with Python - Feb 18, 2020.
Fitbit provides a Web API for accessing data from Fitbit activity trackers. Check out this updated tutorial to accessing this Fitbit data using the API with Python.
- Fourier Transformation for a Data Scientist - Feb 14, 2020.
The article contains a brief intro into Fourier transformation mathematically and its applications in AI.
- Adversarial Validation Overview - Feb 13, 2020.
Learn how to implement adversarial validation that builds a classifier to determine if your data is from the training or testing sets. If you can do this, then your data has issues, and your adversarial validation model can help you diagnose the problem.
- Practical Hyperparameter Optimization - Feb 13, 2020.
An introduction on how to fine-tune Machine and Deep Learning models using techniques such as: Random Search, Automated Hyperparameter Tuning and Artificial Neural Networks Tuning.
- Easy Image Dataset Augmentation with TensorFlow - Feb 13, 2020.
What can we do when we don't have a substantial amount of varied training data? This is a quick intro to using data augmentation in TensorFlow to perform in-memory image transformations during model training to help overcome this data impediment.
- Sharing your machine learning models through a common API - Feb 12, 2020.
DEEPaaS API is a software component developed to expose machine learning models through a REST API. In this article we describe how to do it.
- Intent Recognition with BERT using Keras and TensorFlow 2 - Feb 10, 2020.
TL;DR Learn how to fine-tune the BERT model for text classification. Train and evaluate it on a small dataset for detecting seven intents. The results might surprise you!
- Getting up and Running with Python: Installing Anaconda on Windows - Feb 6, 2020.
This tutorial covers how to download and install Anaconda on Windows; how to test your installation; how to fix common installation issues; and what to do after installing Anaconda.
- Create Your Own Computer Vision Sandbox - Feb 5, 2020.
This post covers a wide array of computer vision tasks, from automated data collection to CNN model building.
- Audio File Processing: ECG Audio Using Python - Feb 4, 2020.
In this post, we will look into an application of audio file processing, for a good cause — Analysis of ECG Heart beat and write code in python.
- How to Optimize Your Jupyter Notebook - Jan 30, 2020.
This article walks through some simple tricks on improving your Jupyter Notebook experience, and covers useful shortcuts, adding themes, automatically generated table of contents, and more.
- Generating English Pronoun Questions Using Neural Coreference Resolution - Jan 29, 2020.
This post will introduce a practical method for generating English pronoun questions from any story or article. Learn how to take an additional step toward computationally understanding language.
- Exoplanet Hunting Using Machine Learning - Jan 28, 2020.
Search for exoplanets — those planets beyond our own solar system — using machine learning, and implement these searches in Python.
- The 5 Most Useful Techniques to Handle Imbalanced Datasets - Jan 22, 2020.
This post is about explaining the various techniques you can use to handle imbalanced datasets.
- Random Forest® — A Powerful Ensemble Learning Algorithm - Jan 22, 2020.
The article explains the Random Forest algorithm and how to build and optimize a Random Forest classifier.
- Geovisualization with Open Data - Jan 15, 2020.
In this post I want to show how to use public available (open) data to create geo visualizations in python. Maps are a great way to communicate and compare information when working with geolocation data. There are many frameworks to plot maps, here I focus on matplotlib and geopandas (and give a glimpse of mplleaflet).
- KDnuggets™ News 20:n02, Jan 15: Top 5 Must-have Data Science Skills; Learn Machine Learning with THIS Book - Jan 15, 2020.
This week: learn the 5 must-have data science skills for the new year; find out which book is THE book to get started learning machine learning; pick up some Python tips and tricks; learn SQL, but learn it the hard way; and find an introductory guide to learning common NLP techniques.
- KDnuggets™ News 20:n01, Jan 8: How to “Ultralearn” Data Science; How teams do AutoML? - Jan 8, 2020.
First issue of 2020 brings you a summary of how to "Ultralearn" Data Science - for those in a hurry; Explains how teams work on AutoML project; Why Python is a preferred language for Data Science; and a cartoon on teaching ethics to AI.
- 10 Python Tips and Tricks You Should Learn Today - Jan 8, 2020.
Check out this collection of 10 Python snippets that can be taken as a reference for your daily work.
- H2O Framework for Machine Learning - Jan 6, 2020.
This article is an overview of H2O, a scalable and fast open-source platform for machine learning. We will apply it to perform classification tasks.
- How to Convert a Picture to Numbers - Jan 6, 2020.
Reducing images to numbers makes them amenable to computation. Let's take a look at the why and the how using Python and Numpy.
- Why Python is One of the Most Preferred Languages for Data Science? - Jan 3, 2020.
Why do most data scientists love Python? Learn more about how so many well-developed Python packages can help you accomplish your crucial data science tasks.
- Predict Electricity Consumption Using Time Series Analysis - Jan 2, 2020.
Time series forecasting is a technique for the prediction of events through a sequence of time. In this post, we will be taking a small forecasting problem and try to solve it till the end learning time series forecasting alongside.
- Top KDnuggets tweets, Dec 18-30: A Gentle Introduction to Math Behind Neural Networks - Dec 31, 2019.
A Gentle Introduction to #Math Behind #NeuralNetworks; Learn How to Quickly Create UIs in Python; I wanna be a data scientist, but... how!?; I created my own deepfake in two weeks
- Fighting Overfitting in Deep Learning - Dec 27, 2019.
This post outlines an attack plan for fighting overfitting in neural networks.
- Market Basket Analysis: A Tutorial - Dec 24, 2019.
This article is about Market Basket Analysis & the Apriori algorithm that works behind it.
- KDnuggets™ News 19:n48, Dec 18: Build Pipelines with Pandas Using pdpipe; AI, Analytics, ML, DS, Technology Main Developments, Key Trends; Poll on AutoML - Dec 18, 2019.
Build Pipelines with Pandas Using pdpipe; AI, Analytics, ML, DS, Technology Main Developments, Key Trends; New Poll: Does AutoML work? Ultralearn Data Science; Python Dictionary How-To; Top stories of 2019 and more.
- Pedestrian Detection Using Non Maximum Suppression Algorithm - Dec 17, 2019.
Read this overview of a complete pipeline for detecting pedestrians on the road.
- Let’s Build an Intelligent Chatbot - Dec 17, 2019.
Check out this step by step approach to building an intelligent chatbot in Python.
- Build Pipelines with Pandas Using pdpipe - Dec 13, 2019.
We show how to build intuitive and useful pipelines with Pandas DataFrame using a wonderful little library called pdpipe.
- Plotnine: Python Alternative to ggplot2 - Dec 12, 2019.
Python's plotting libraries such as matplotlib and seaborn does allow the user to create elegant graphics as well, but lack of a standardized syntax for implementing the grammar of graphics compared to the simple, readable and layering approach of ggplot2 in R makes it more difficult to implement in Python.
- Python Dictionary Guide: 10 Python Dictionary Methods & Examples - Dec 12, 2019.
Master Python Dictionaries and their essential functions in 15 minutes with this introductory guide.
- Top KDnuggets tweets, Dec 04-10: AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2019 and Key Trends for 2020 - Dec 11, 2019.
AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments and Key Trends; Down with technical debt! Clean #Python for #DataScientists; Calculate Similarity - the most relevant Metrics in a Nutshell.
- Interpretability: Cracking open the black box, Part 2 - Dec 11, 2019.
The second part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers post-hoc interpretation that is useful when the model is not transparent.
- 5 Great New Features in Latest Scikit-learn Release - Dec 10, 2019.
From not sweating missing values, to determining feature importance for any estimator, to support for stacking, and a new plotting API, here are 5 new features of the latest release of Scikit-learn which deserve your attention.
- 10 Free Top Notch Machine Learning Courses - Dec 6, 2019.
Are you interested in studying machine learning over the holidays? This collection of 10 free top notch courses will allow you to do just that, with something for every approach to improving your machine learning skills.
- Lit BERT: NLP Transfer Learning In 3 Steps - Nov 29, 2019.
PyTorch Lightning is a lightweight framework which allows anyone using PyTorch to scale deep learning code easily while making it reproducible. In this tutorial we’ll use Huggingface's implementation of BERT to do a finetuning task in Lightning.
- Open Source Projects by Google, Uber and Facebook for Data Science and AI - Nov 28, 2019.
Open source is becoming the standard for sharing and improving technology. Some of the largest organizations in the world namely: Google, Facebook and Uber are open sourcing their own technologies that they use in their workflow to the public.
- KDnuggets™ News 19:n45, Nov 27: Interpretable vs black box models; Advice for New and Junior Data Scientists - Nov 27, 2019.
This week: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead; Advice for New and Junior Data Scientists; Python Tuples and Tuple Methods; Can Neural Networks Develop Attention? Google Thinks they Can; Three Methods of Data Pre-Processing for Text Classification
- Content-based Recommender Using Natural Language Processing (NLP) - Nov 26, 2019.
A guide to build a content-based movie recommender model based on NLP.
- Automated Machine Learning Project Implementation Complexities - Nov 22, 2019.
To demonstrate the implementation complexity differences along the AutoML highway, let's have a look at how 3 specific software projects approach the implementation of just such an AutoML "solution," namely Keras Tuner, AutoKeras, and automl-gs.
- Python, Selenium & Google for Geocoding Automation: Free and Paid - Nov 21, 2019.
This tutorial will take you through two options that have automated the geocoding process for the user using Python, Selenium and Google Geocoding API.
- The Notebook Anti-Pattern - Nov 21, 2019.
This article aims to explain why this drive towards the use of notebooks in production is an anti pattern, giving some suggestions along the way.
- Python Tuples and Tuple Methods - Nov 21, 2019.
Brush up on your Python basics with this post on creating, using, and manipulating tuples.
- Data Science for Managers: Programming Languages - Nov 19, 2019.
In this article, we are going to talk about popular languages for Data Science and briefly describe each of them.
- GitHub Repo Raider and the Automation of Machine Learning - Nov 18, 2019.
Since X never, ever marks the spot, this article raids the GitHub repos in search of quality automated machine learning resources. Read on for projects and papers to help understand and implement AutoML.
- Python Lists and List Manipulation - Nov 15, 2019.
In Python, lists store an ordered collection of items which can be of different types. This post is an overview of lists and their manipulation.
- How to Visualize Data in Python (and R) - Nov 14, 2019.
Producing accessible data visualizations is a key data science skill. The following guidelines will help you create the best representations of your data using R and Python's Pandas library.
- Testing Your Machine Learning Pipelines - Nov 14, 2019.
Let’s take a look at traditional testing methodologies and how we can apply these to our data/ML pipelines.
- Python Workout / Practices of a Python Pro / Classic Computer Science Problems in Python - Nov 13, 2019.
Whether you’re a beginner or an expert, there’s always new ways you can improve your Python coding. Save 40% off this trio of Manning Python books today! Just enter the code nlpropython40 at checkout when you buy from manning.com.
- Beginners Guide to the Three Types of Machine Learning - Nov 13, 2019.
The following article is an introduction to classification and regression — which are known as supervised learning — and unsupervised learning — which in the context of machine learning applications often refers to clustering — and will include a walkthrough in the popular python library scikit-learn.
- KDnuggets™ News 19:n43, Nov 13: Dynamic Reports in Python and R; Creating NLP Vocabularies; What is Data Science? - Nov 13, 2019.
On KDnuggets this week: Orchestrating Dynamic Reports in Python and R with Rmd Files; How to Create a Vocabulary for NLP Tasks in Python; What is Data Science?; The Complete Data Science LinkedIn Profile Guide; Set Operations Applied to Pandas DataFrames; and much, much more.
- How to Speed up Pandas by 4x with one line of code - Nov 12, 2019.
While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep.
- Understanding Boxplots - Nov 8, 2019.
A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.
- Orchestrating Dynamic Reports in Python and R with Rmd Files - Nov 8, 2019.
Do you want to extract csv files with Python and visualize them in R? How does preparing everything in R and make conclusions with Python sound? Both are possible if you know the right libraries and techniques. Here, we’ll walk through a use-case using both languages in one analysis
- Data Cleaning and Preprocessing for Beginners - Nov 7, 2019.
Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.
- Set Operations Applied to Pandas DataFrames - Nov 7, 2019.
In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.
- How to Create a Vocabulary for NLP Tasks in Python - Nov 7, 2019.
This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.
- Customer Segmentation Using K Means Clustering - Nov 4, 2019.
Customer Segmentation can be a powerful means to identify unsatisfied customer needs. This technique can be used by companies to outperform the competition by developing uniquely appealing products and services.
- Build an Artificial Neural Network From Scratch: Part 1 - Nov 1, 2019.
This article focused on building an Artificial Neural Network using the Numpy Python library.
- How to Build Your Own Logistic Regression Model in Python - Oct 31, 2019.
A hands on guide to Logistic Regression for aspiring data scientist and machine learning engineer.
- KDnuggets™ News 19:n41, Oct 30: Feature Selection: Beyond feature importance?; Time Series Analysis Using KNIME and Spark - Oct 30, 2019.
This week in KDnuggets: Feature Selection: Beyond feature importance?; Time Series Analysis: A Simple Example with KNIME and Spark; 5 Advanced Features of Pandas and How to Use Them; How to Measure Foot Traffic Using Data Analytics; Introduction to Natural Language Processing (NLP); and much, much more!
- How to Extend Scikit-learn and Bring Sanity to Your Machine Learning Workflow - Oct 29, 2019.
In this post, learn how to extend Scikit-learn code to make your experiments easier to maintain and reproduce.
- 5 Advanced Features of Pandas and How to Use Them - Oct 25, 2019.
The pandas library offers core functionality when preparing your data using Python. But, many don't go beyond the basics, so learn about these lesser-known advanced methods that will make handling your data easier and cleaner.
- Convolutional Neural Network for Breast Cancer Classification - Oct 24, 2019.
See how Deep Learning can help in solving one of the most commonly diagnosed cancer in women.
- How to Write Web Apps Using Simple Python for Data Scientists - Oct 22, 2019.
Convert your Data Science Projects into cool apps easily without knowing any web frameworks.
- Writing Your First Neural Net in Less Than 30 Lines of Code with Keras - Oct 18, 2019.
Read this quick overview of neural networks and learn how to implement your first in very few lines using Keras.
- How to Easily Deploy Machine Learning Models Using Flask - Oct 17, 2019.
This post aims to make you get started with putting your trained machine learning models into production using Flask API.
- The 5 Classification Evaluation Metrics Every Data Scientist Must Know - Oct 16, 2019.
This post is about various evaluation metrics and how and when to use them.
- Activation maps for deep learning models in a few lines of code - Oct 10, 2019.
We illustrate how to show the activation maps of various layers in a deep CNN model with just a couple of lines of code.
- Top KDnuggets tweets, Oct 02-08: Turn #Python Scripts into Beautiful ML Tools – with Streamlit, an app framework built for #MachineLearning engineers - Oct 9, 2019.
Also: 12 things I wish I'd known before starting as a Data Scientist; 10 Free Top Notch Natural Language Processing Courses; The Last SQL Guide for Data Analysis; The 4 Quadrants of #DataScience Skills and 7 Principles for Creating a Viral DataViz.
- Contributing to PyTorch: By someone who doesn’t know a ton about PyTorch - Oct 9, 2019.
By the end of my week with the team, I managed to proudly cut two PRs on GitHub. I decided that I would write a blog post to knowledge share, not just to show that YES, you can too.
- The 4 Quadrants of Data Science Skills and 7 Principles for Creating a Viral Data Visualization - Oct 7, 2019.
As a data scientist, your most important skill is creating meaningful visualizations to disseminate knowledge and impact your organization or client. These seven principals will guide you toward developing charts with clarity, as exemplified with data from a recent KDnuggets poll.
- KDnuggets™ News 19:n37, Oct 2: The Future of Analytics & Data Science! Starting NLP with spaCy & Python - Oct 2, 2019.
This week, find out what the future of analytics and data science holds; get an introduction to spaCy for natural language processing; find out how to use time series analysis for baseball; get to know your data; read 6 bits of advice for data scientists; and much, much more!
- What is Hierarchical Clustering? - Sep 27, 2019.
The article contains a brief introduction to various concepts related to Hierarchical clustering algorithm.
- Natural Language in Python using spaCy: An Introduction - Sep 26, 2019.
This article provides a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries.
- KDnuggets™ News 19:n36, Sep 25: The Hidden Risk of AI and Big Data; The 5 Sampling Algorithms every Data Scientist needs to know - Sep 25, 2019.
Learn about unexpected risk of AI applied to Big Data; Study 5 Sampling Algorithms every Data Scientist needs to know; Read how one data scientist copes with his boring days of deploying machine learning; 5 beginner-friendly steps to learn ML with Python; and more.
- A Single Function to Streamline Image Classification with Keras - Sep 23, 2019.
We show, step-by-step, how to construct a single, generalized, utility function to pull images automatically from a directory and train a convolutional neural net model.
- A Gentle Introduction to PyTorch 1.2 - Sep 20, 2019.
This comprehensive tutorial aims to introduce the fundamentals of PyTorch building blocks for training neural networks.
- Applying Data Science to Cybersecurity Network Attacks & Events - Sep 19, 2019.
Check out this detailed tutorial on applying data science to the cybersecurity domain, written by an individual with backgrounds in both fields.
- 5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python - Sep 19, 2019.
“I want to learn machine learning and artificial intelligence, where do I start?” Here.
- Python 2 End of Life Survey – Are You Prepared? - Sep 18, 2019.
Support for Python 2 will expire on Jan. 1, 2020, after which the Python core language and many third-party packages will no longer be supported or maintained. Take this survey to help determine and share your level of preparation.
- Which Data Science Skills are core and which are hot/emerging ones? - Sep 17, 2019.
We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.
- Explore the world of Bioinformatics with Machine Learning - Sep 17, 2019.
The article contains a brief introduction of Bioinformatics and how a machine learning classification algorithm can be used to classify the type of cancer in each patient by their gene expressions.
- 5 Step Guide to Scalable Deep Learning Pipelines with d6tflow - Sep 16, 2019.
How to turn a typical pytorch script into a scalable d6tflow DAG for faster research & development.
- Ensemble Methods for Machine Learning: AdaBoost - Sep 12, 2019.
It turned out that, if we ask the weak algorithm to create a whole bunch of classifiers (all weak for definition), and then combine them all, what may figure out is a stronger classifier.
- Train sklearn 100x Faster - Sep 11, 2019.
As compute gets cheaper and time to market for machine learning solutions becomes more critical, we’ve explored options for speeding up model training. One of those solutions is to combine elements from Spark and scikit-learn into our own hybrid solution.
- KDnuggets™ News 19:n34, Sep 11: I wasn’t getting hired as a Data Scientist. So I sought data on who is - Sep 11, 2019.
How one person overcame rejections applying to Data Scientist positions by getting actual data on who is getting hired; Advice from Andrew Ng on building ML career and reading research papers; 10 Great Python resources for Data Scientists; Python Libraries for Interpretable ML.
- The 5 Graph Algorithms That Data Scientists Should Know - Sep 10, 2019.
In this post, I am going to be talking about some of the most important graph algorithms you should know and how to implement them using Python.
- OpenStreetMap Data to ML Training Labels for Object Detection - Sep 9, 2019.
I am really interested in creating a tight, clean pipeline for disaster relief applications, where we can use something like crowd sourced building polygons from OSM to train a supervised object detector to discover buildings in an unmapped location.
- 10 Great Python Resources for Aspiring Data Scientists - Sep 9, 2019.
This is a collection of 10 interesting resources in the form of articles and tutorials for the aspiring data scientist new to Python, meant to provide both insight and practical instruction when starting on your journey.
- Build Your First Voice Assistant - Sep 6, 2019.
Hone your practical speech recognition application skills with this overview of building a voice assistant using Python.
- Learn Quantum Computing with Python and Q#, Get Programming with Python, Data Science with Python and Dask - Sep 4, 2019.
Save 40% on Get Programming with Python, Data Science with Python and Dask, and Learn Quantum Computing with Python and Q# with code nlpython40.
- An Easy Introduction to Machine Learning Recommender Systems - Sep 4, 2019.
Recommender systems are an important class of machine learning algorithms that offer "relevant" suggestions to users. Categorized as either collaborative filtering or a content-based system, check out how these approaches work along with implementations to follow from example code.
- Python Libraries for Interpretable Machine Learning - Sep 4, 2019.
In the following post, I am going to give a brief guide to four of the most established packages for interpreting and explaining machine learning models.
- An Overview of Topics Extraction in Python with Latent Dirichlet Allocation - Sep 4, 2019.
A recurring subject in NLP is to understand large corpus of texts through topics extraction. Whether you analyze users’ online reviews, products’ descriptions, or text entered in search bars, understanding key topics will always come in handy.
- Automate your Python Scripts with Task Scheduler: Windows Task Scheduler to Scrape Alternative Data - Sep 3, 2019.
In this tutorial, you will learn how to run task scheduler to web scrape data from Lazada (eCommerce) website and dump it into SQLite RDBMS Database.
- Object-oriented programming for data scientists: Build your ML estimator - Aug 30, 2019.
Implement some of the core OOP principles in a machine learning context by building your own Scikit-learn-like estimator, and making it better.
- 4 Tips for Advanced Feature Engineering and Preprocessing - Aug 29, 2019.
Techniques for creating new features, detecting outliers, handling imbalanced data, and impute missing values.
- Nothing but NumPy: Understanding & Creating Neural Networks with Computational Graphs from Scratch - Aug 23, 2019.
Entirely implemented with NumPy, this extensive tutorial provides a detailed review of neural networks followed by guided code for creating one from scratch with computational graphs.
- Comparing Decision Tree Algorithms: Random Forest® vs. XGBoost - Aug 21, 2019.
Check out this tutorial walking you through a comparison of XGBoost and Random Forest. You'll learn how to create a decision tree, how to do tree bagging, and how to do tree boosting.
- Understanding Decision Trees for Classification in Python - Aug 21, 2019.
This tutorial covers decision trees for classification also known as classification trees, including the anatomy of classification trees, how classification trees make predictions, using scikit-learn to make classification trees, and hyperparameter tuning.
- Automate Stacking In Python: How to Boost Your Performance While Saving Time - Aug 21, 2019.
Utilizing stacking (stacked generalizations) is a very hot topic when it comes to pushing your machine learning algorithm to new heights. For instance, most if not all winning Kaggle submissions nowadays make use of some form of stacking or a variation of it.
- KDnuggets™ News 19:n31, Aug 21: Become a Marketable Data Scientist; Data Science Command Line Basics; Chatbots with Keras - Aug 21, 2019.
This week's news: Become More Marketable as a Data Scientist; Command Line Basics Every Data Scientist Should Know; Chatbots with Keras!; Understanding Cancer using Machine Learning; Statistical Modelling vs Machine Learning; Is Kaggle Learn a "Faster Data Science Education?"; and much more!
- An Overview of Python’s Datatable package - Aug 20, 2019.
Modern machine learning applications need to process a humongous amount of data and generate multiple features. Python’s datatable module was created to address this issue. It is a toolkit for performing big data (up to 100GB) operations on a single-node machine, at the maximum possible speed.
- Deep Learning for NLP: Creating a Chatbot with Keras! - Aug 19, 2019.
Learn how to use Keras to build a Recurrent Neural Network and create a Chatbot! Who doesn’t like a friendly-robotic personal assistant?
- Pytorch Lightning vs PyTorch Ignite vs Fast.ai - Aug 16, 2019.
Here, I will attempt an objective comparison between all three frameworks. This comparison comes from laying out similarities and differences objectively found in tutorials and documentation of all three frameworks.
- Learn how to use PySpark in under 5 minutes (Installation + Tutorial) - Aug 13, 2019.
Apache Spark is one of the hottest and largest open source project in data processing framework with rich high-level APIs for the programming languages like Scala, Python, Java and R. It realizes the potential of bringing together both Big Data and machine learning.