Three R Libraries Every Data Scientist Should Know (Even if You Use Python) - Dec 20, 2021.
Check out these powerful R libraries built by the world’s biggest tech companies.
Data Science, Data Scientist, Python, R
- Four Different Pipes for R with magrittr - Oct 6, 2021.
The magrittr package supplies the pipe operator (%>%), but it turns out that the package actually contains four pipe operators in total. Let's go into them a bit.
Pipeline, R, Tidyverse

Path to Full Stack Data Science - Sep 27, 2021.
Start your journey toward mastering all aspects of the field of Data Science with this focused list of in-depth self-learning resources. Curated with the beginner in mind, these recommendations will help you learn efficiently, and can also offer existing professionals useful highlights for review or help filling in any gaps in skills.
Career Advice, Data Science, Data Science Education, Data Visualization, Mathematics, Python, R, Roadmap
- ebook: Learn Data Science with R – free download - Sep 7, 2021.
Check out this new book for data science beginners with many practical examples that covers statistics, R, graphing, and machine learning. As a source to learn the full breadth of data science foundations, "Learn Data Science with R" starts at the beginner level and gradually progresses into expert content.
Data Science, Data Science Education, ebook, R
- Introduction to Statistical Learning Second Edition - Aug 13, 2021.
The second edition of the classic "An Introduction to Statistical Learning, with Applications in R" was published very recently, and is now freely-available via PDF on the book's website.
Books, Data Science, Machine Learning, R, Statistical Learning, Statistics
- 5 Tips for Writing Clean R Code - Aug 9, 2021.
This article summarizes the most common mistakes to avoid and outline best practices to follow in programming in general. Follow these tips to speed up the code review iteration process and be a rockstar developer in your reviewer’s eyes!
Programming, R
The Most In-Demand Skills for Data Scientists in 2021 - Apr 15, 2021.
If you are preparing to make a career as a Data Scientist or are looking for opportunities to skill-up in your current role, this analysis of in-demand skills for 2021, based on over 15,000 Data Scientist job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.
AWS, Data Science Skills, Python, PyTorch, R, scikit-learn, SQL, TensorFlow
- Data Science Curriculum for Professionals - Mar 25, 2021.
If you are looking to expand or transition your current professional career that is buried in spreadsheet analysis into one powered by data science, then you are in for an exciting but complex journey with much to explore and master. To begin your adventure, following this complete road map to guide you from a gnome in the forest of spreadsheets to an AI wizard known far and wide throughout the kingdom.
Cloud Computing, Data Science Education, Data Visualization, Machine Learning, Python, R, Roadmap, Statistics
- Support Vector Machine for Hand Written Alphabet Recognition in R - Jan 27, 2021.
We attempt to break down a problem of hand written alphabet image recognition into a simple process rather than using heavy packages. This is an attempt to create the data and then build a model using Support Vector Machines for Classification.
Classification, Image Recognition, Machine Learning, R, Support Vector Machines
- Creating Good Meaningful Plots: Some Principles - Jan 12, 2021.
Hera are some thought starters to help you create meaningful plots.
Charts, Data Visualization, Python, R
15 Free Data Science, Machine Learning & Statistics eBooks for 2021 - Dec 31, 2020.
We present a curated list of 15 free eBooks compiled in a single location to close out the year.
Automated Machine Learning, Data Science, Deep Learning, Free ebook, Machine Learning, NLP, Python, R, Statistics
- Undersampling Will Change the Base Rates of Your Model’s Predictions - Dec 17, 2020.
In classification problems, the proportion of cases in each class largely determines the base rate of the predictions produced by the model. Therefore if you use sampling techniques that change this proportion, there is a good chance you will want to rescale / calibrate your predictions before using them in the wild.
Classification, Modeling, Predictions, R, Sampling
- KDnuggets™ News 20:n47, Dec 16: A Rising Library Beating Pandas in Performance; R or Python? Why Not Both? - Dec 16, 2020.
Also: 10 Python Skills They Don't Teach in Bootcamp; Data Science Volunteering: Ways to Help; A Journey from Software to Machine Learning Engineer; Data Science and Machine Learning: The Free eBook
Data Science, Free ebook, IDE, Machine Learning, Machine Learning Engineer, Pandas, Python, R
R or Python? Why Not Both? - Dec 9, 2020.
Do you use both R and Python, either in different projects or in the same? Check out prython, an IDE designed to handle your needs.
Data Analysis, Data Science, IDE, Programming, Python, R
- Simple & Intuitive Ensemble Learning in R - Dec 2, 2020.
Read about metaEnsembleR, an R package for heterogeneous ensemble meta-learning (classification and regression) that is fully-automated.
Classification, Ensemble Methods, R, Regression
- Top 6 Data Science Programs for Beginners - Nov 20, 2020.
Udacity has the best industry-leading programs in data science. Here are the top six data science courses for beginners to help you get started.
Beginners, Certificate, Data Engineer, Data Science Education, Data Visualization, Online Education, Python, R, SQL, Udacity
- Behavior Analysis with Machine Learning and R: The free eBook - Oct 22, 2020.
Check out this new free ebook to learn how to leverage the power of machine learning to analyze behavioral patterns from sensor data and electronic records using R.
Behavioral Analytics, Free ebook, Machine Learning, R
- KDnuggets™ News 20:n40, Oct 21: fastcore: An Underrated Python Library; Goodhart’s Law for Data Science: what happens when a measure becomes a target? - Oct 21, 2020.
fastcore: An Underrated Python Library; Goodhart's Law for Data Science and what happens when a measure becomes a target?; Text Mining with R: The Free eBook; Free From MIT: Intro to Computational Thinking and Data Science; How to ace the data science coding challenge
Challenge, Courses, Data Science, Free ebook, Measurement, MIT, Python, R, Text Mining
Text Mining with R: The Free eBook - Oct 15, 2020.
This freely-available book will show you how to perform text analytics in R, using packages from the tidyverse.
Free ebook, R, Text Mining, Tidyverse
- Data Science Tools Illustrated Study Guides - Aug 25, 2020.
These data science tools illustrated guides are broken up into four distinct categories: data retrieval, data manipulation, data visualization, and engineering tips. Both online and PDF versions of these guides are available.
Cheat Sheet, Data Preprocessing, Data Processing, Data Science, Data Science Tools, Data Visualization, Python, R, SQL
- Better Blog Post Analysis with googleAnalyticsR - Jul 24, 2020.
In this post, we'll walk through using googleAnalyticsR for better blog post analysis, so you can do my better blog post analysis for yourself!
Analysis, Blogs, Google Analytics, R
Wrapping Machine Learning Techniques Within AI-JACK Library in R - Jul 17, 2020.
The article shows an approach to solving problem of selecting best technique in machine learning. This can be done in R using just one library called AI-JACK and the article shows how to use this tool.
Automated Machine Learning, AutoML, Machine Learning, Modeling, R
- Understanding Time Series with R - Jul 9, 2020.
Analyzing time series is such a useful resource for essentially any business, data scientists entering the field should bring with them a solid foundation in the technique. Here, we decompose the logical components of a time series using R to better understand how each plays a role in this type of analysis.
Beginners, Business Analytics, Data Analysis, R, Time Series
- An Introduction to Statistical Learning: The Free eBook - Jun 29, 2020.
This week's free eBook is a classic of data science, An Introduction to Statistical Learning, with Applications in R. If interested in picking up elementary statistical learning concepts, and learning how to implement them in R, this book is for you.
Free ebook, R, Robert Tibshirani, Statistical Learning, Trevor Hastie
- Practical Markov Chain Monte Carlo - Jun 26, 2020.
This is a slightly more intricate example of MCMC, compared to many with a fairly simple model, a single predictor (maybe two), and not much else, which highlights a couple of issues and tricks worth noting for a handwritten implementation.
Bayesian, Markov Chains, Monte Carlo, R
- Data Science Tools Popularity, animated - Jun 25, 2020.
Watch the evolution of the top 10 most popular data science tools based on KDnuggets software polls from 2000 to 2019.
About KDnuggets, Data Science Platform, Poll, Python, R
- Build a Branded Web Based GIS Application Using R, Leaflet and Flexdashboard - Jun 24, 2020.
By using R, Flexdashboard and Leaflet, we can build a customized and branded web application to showcase location based data interactively across the organization. Instead of crowding the application with many widgets, we use menu tabs and pages to separate the interactive aspects.
Data Scientist, Data Visualization, Geospatial, GIS, Leaflet, R, Rstudio
- modelStudio and The Grammar of Interactive Explanatory Model Analysis - Jun 19, 2020.
modelStudio is an R package that automates the exploration of ML models and allows for interactive examination. It works in a model agnostic fashion, therefore is compatible with most of the ML frameworks.
Analysis, Explainability, Interpretability, Machine Learning, R
- Fighting Disease with Data: Q&A with Epidemiologist Amrish Baidjoe - Jun 11, 2020.
Data science tools are powerful for investigating the current pandemic and other outbreaks, when accurate and actionable data are crucial. Epidemiologist and R Epidemics Consortium leader Amrish Baidjoe shared his insights into using data science to fight disease, from modeling to automation to new technologies.
Data Science, Healthcare, R
Python for data analysis… is it really that simple?!? - Apr 2, 2020.
The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations.
Data Analysis, Pandas, Python, R, SQL
Time Series Classification Synthetic vs Real Financial Time Series - Mar 18, 2020.
This article discusses distinguishing between real financial time series and synthetic time series using XGBoost.
Finance, R, Time Series, XGBoost
- Decision Boundary for a Series of Machine Learning Models - Mar 13, 2020.
I train a series of Machine Learning models using the iris dataset, construct synthetic data from the extreme points within the data and test a number of Machine Learning models in order to draw the decision boundaries from which the models make predictions in a 2D space, which is useful for illustrative purposes and understanding on how different Machine Learning models make predictions.
Decision Boundaries, Machine Learning, Modeling, R
- KDnuggets™ News 20:n09, Mar 4: When Will AutoML replace Data Scientists (if ever) – vote; 20 AI, DS, ML Terms You Need to Know (part 2) - Mar 4, 2020.
AutoML, Data Science Education, Decision Trees, Key Terms, Probability, Python, R
Python and R Courses for Data Science - Feb 26, 2020.
Since Python and R are a must for today's data scientists, continuous learning is paramount. Online courses are arguably the best and most flexible way to upskill throughout ones career.
Coursera, Data Science, edX, MOOC, Programming, Python, R
- KDnuggets™ News 20:n08, Feb 26: Gartner 2020 Magic Quadrant for Data Science & Machine Learning Platforms; Will AutoML Replace Data Scientists? - Feb 26, 2020.
This week in KDnuggets: The Death of Data Scientists - will AutoML replace them?; Leaders, Changes, and Trends in Gartner 2020 Magic Quadrant for Data Science and Machine Learning Platforms; Hand labeling is the past. The future is #NoLabel AI; The Forgotten Algorithm; Getting Started with R Programming; and much, much more.
Algorithms, AutoML, Data Science, Data Scientist, Gartner, Machine Learning, Magic Quadrant, Mathematics, R
- Getting Started with R Programming - Feb 19, 2020.
An end to end Data Analysis using R, the second most requested programming language in Data Science.
Data Science, Machine Learning, Programming, R
- Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - Feb 14, 2020.
When reviewing geographical data, it can be difficult to prepare the data for an analysis. This article helps by covering importing data into a SQL Server database; cleansing and grouping data into a map grid; adding time data points to the set of grid data and filling in the gaps where no crimes occurred; importing the data into R; running XGBoost model to determine where crimes will occur on a specific day
Crime, Geospatial, R, SQL, Tableau, Time Series
- Basics of Audio File Processing in R - Feb 11, 2020.
This post provides basic information on audio processing using R as the programming language. It also walks through and understands some basics of sound and digital audio.
Audio, Data Processing, R
- Serverless Machine Learning with R on Cloud Run - Feb 4, 2020.
Expedite the deployment of your machine models using serverless cloud infrastructure. In this tutorial, we explore creating and deploying a model which scraps real time Twitter data and returns interactive visualization using R.
Cloud, Machine Learning, R, Twitter
- Classify A Rare Event Using 5 Machine Learning Algorithms - Jan 15, 2020.
Which algorithm works best for unbalanced data? Are there any tradeoffs?
Algorithms, Classification, Machine Learning, R, ROC-AUC, Unbalanced
- Beginner’s Guide to K-Nearest Neighbors in R: from Zero to Hero - Jan 3, 2020.
This post presents a pipeline of building a KNN model in R with various measurement metrics.
Beginners, K-nearest neighbors, Metrics, R
Plotnine: Python Alternative to ggplot2 - Dec 12, 2019.
Python's plotting libraries such as matplotlib and seaborn does allow the user to create elegant graphics as well, but lack of a standardized syntax for implementing the grammar of graphics compared to the simple, readable and layering approach of ggplot2 in R makes it more difficult to implement in Python.
Data Science, Data Visualization, Python, R
Data Science for Managers: Programming Languages - Nov 19, 2019.
In this article, we are going to talk about popular languages for Data Science and briefly describe each of them.
Data Science, Manager, MATLAB, Octave, Programming Languages, Python, R, Scala
- How to Visualize Data in Python (and R) - Nov 14, 2019.
Producing accessible data visualizations is a key data science skill. The following guidelines will help you create the best representations of your data using R and Python's Pandas library.
Data Visualization, Matplotlib, Python, R, SuperDataScience
- KDnuggets™ News 19:n43, Nov 13: Dynamic Reports in Python and R; Creating NLP Vocabularies; What is Data Science? - Nov 13, 2019.
On KDnuggets this week: Orchestrating Dynamic Reports in Python and R with Rmd Files; How to Create a Vocabulary for NLP Tasks in Python; What is Data Science?; The Complete Data Science LinkedIn Profile Guide; Set Operations Applied to Pandas DataFrames; and much, much more.
Data Science, Deep Learning, Facebook, LinkedIn, NLP, Pandas, Python, R, Reinforcement Learning, Report
- Orchestrating Dynamic Reports in Python and R with Rmd Files - Nov 8, 2019.
Do you want to extract csv files with Python and visualize them in R? How does preparing everything in R and make conclusions with Python sound? Both are possible if you know the right libraries and techniques. Here, we’ll walk through a use-case using both languages in one analysis
Python, R, Report
- Customer Segmentation for R Users - Sep 26, 2019.
This article shows you how to separate your customers into distinct groups based on their purchase behavior. For the R enthusiasts out there, I demonstrated what you can do with r/stats, ggradar, ggplot2, animation, and factoextra.
Customer Analytics, R, Segmentation
- Scikit-Learn vs mlr for Machine Learning - Sep 10, 2019.
How does the scikit-learn machine learning library for Python compare to the mlr package for R? Following along with a machine learning workflow through each approach, and see if you can gain a competitive advantage by knowing both frameworks.
Exxact, Machine Learning, R, scikit-learn
- KDnuggets™ News 19:n33, Sep 4: Data Science Skills Poll; Object-oriented Programming for Data Scientists - Sep 4, 2019.
This week: Object-oriented programming for data scientists; Deep Learning Next Step: Transformers and Attention Mechanism; R Users' Salaries from the 2019 Stackoverflow Survey; Types of Bias in Machine Learning; 4 Tips for Advanced Feature Engineering and Preprocessing; and much more!
Data Science, Data Science Skills, Deep Learning, NLP, Programming, R, Salary
- R Users’ Salaries from the 2019 Stackoverflow Survey - Aug 30, 2019.
Let’s take a look on what R users are saying about their salaries. Note that the following results could be biased because of unrepresentative and in some cases small samples.
R, Salary, StackOverflow, Survey
- Coding Random Forests® in 100 lines of code* - Aug 7, 2019.
There are dozens of machine learning algorithms out there. It is impossible to learn all their mechanics; however, many algorithms sprout from the most established algorithms, e.g. ordinary least squares, gradient boosting, support vector machines, tree-based algorithms and neural networks.
Algorithms, Machine Learning, Multicollinearity, R, random forests algorithm
- KDnuggets™ News 19:n29, Aug 7: What 70% of Data Science Learners Do Wrong; Pytorch Cheat Sheet for Beginners - Aug 7, 2019.
This week on KDnuggets: What 70% of Data Science Learners Do Wrong; Pytorch Cheat Sheet for Beginners and Udacity Deep Learning Nanodegree; How a simple mix of object-oriented programming can sharpen your deep learning prototype; Can we trust AutoML to go on full autopilot?; Ten more random useful things in R you may not know about; 25 Tricks for Pandas; and much more!
Automated Machine Learning, Cheat Sheet, Data Science, Pandas, Programming, PyTorch, R
- Ten more random useful things in R you may not know about - Jul 31, 2019.
I had a feeling that R has developed as a language to such a degree that many of us are using it now in completely different ways. This means that there are likely to be numerous tricks, packages, functions, etc that each of us use, but that others are completely unaware of, and would find useful if they knew about them.
Advice, Analytics, Data Science, R
- Kaggle Kernels Guide for Beginners: A Step by Step Tutorial - Jul 23, 2019.
This is an attempt to hold the hands of a complete beginner and walk them through the world of Kaggle Kernels — for them to get started.
Kaggle, Python, R
- The Evolution of a ggplot - Jul 18, 2019.
A step-by-step tutorial showing how to turn a default ggplot into an appealing and easily understandable data visualization in R.
Data Visualization, ggplot2, R
- How to Make Stunning 3D Plots for Better Storytelling - Jul 17, 2019.
3D Plots built in the right way for the right purpose are always stunning. In this article, we’ll see how to make stunning 3D plots with R using ggplot2 and rayshader.
Data Visualization, ggplot2, R, Storytelling
- Modelplotr v1.0 now on CRAN: Visualize the Business Value of your Predictive Models - Jun 21, 2019.
Explaining the business value of your predictive models to your business colleagues is a challenging task. Using Modelplotr, an R package, you can easily create stunning visualizations that clearly communicate the business value of your models.
Pages: 1 2
Business Value, CRAN, Data Visualization, Lift charts, Predictive Models, R
- Ten random useful things in R that you might not know about - Jun 20, 2019.
Because the R ecosystem is so rich and constantly growing, people can often miss out on knowing about something that can really help them in a task that they have to complete
Advice, Analytics, Data Science, R
- KDnuggets™ News 19:n23, Jun 19: Useful Stats for Data Scientists; Python, TensorFlow & R Winners in Latest Job Report - Jun 19, 2019.
This week on KDnuggets: 5 Useful Statistics Data Scientists Need to Know; Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS; How to Learn Python for Data Science the Right Way; The Machine Learning Puzzle, Explained; Scalable Python Code with Pandas UDFs; and much more!
Data Science, Data Scientist, Machine Learning, Pandas, Python, R, Report, SAS, Scalability, Statistics, TensorFlow

Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS - Jun 17, 2019.
Data science jobs continue to grow in 2019, and this report shares the change and spread of jobs by software over recent years.
Data Science, indeed, Jobs, Python, R, SAS, TensorFlow
What you need to know: The Modern Open-Source Data Science/Machine Learning Ecosystem - Jun 10, 2019.
We identify the 6 tools in the modern open-source Data Science ecosystem, examine the Python vs R question, and determine which tools are used the most with Deep Learning and Big Data.
Anaconda, Apache Spark, Big Data Software, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, Tableau, TensorFlow
- The Whole Data Science World in Your Hands - Jun 5, 2019.
Testing MatrixDS capabilities on different languages and tools: Python, R and Julia. If you work with data you have to check this out.
Data Science, Data Scientist, Julia, Jupyter, MatrixDS, Python, R

Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis - May 30, 2019.
Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues.
Pages: 1 2
Anaconda, Apache Spark, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, TensorFlow
- Powerful like your local notebook. Sharable like a Google Doc. - Apr 30, 2019.
Mode is the only analytics platform with native Python and R Notebooks. Get everyone up and running in minutes by delivering Notebook-powered results right in your browser. Now anyone on your team can re-run R- and Python-powered reports themselves—without ever touching code.
Mode Analytics, Python, R, SQL
- KDnuggets™ News 19:n16, Apr 24: Data Visualization in Python with Matplotlib & Seaborn; Getting Into Data Science: The Ultimate Q&A - Apr 24, 2019.
Best Data Visualization Techniques for small and large data; The Rise of Generative Adversarial Networks; Approach pre-trained deep learning models with caution; How Optimization Works; Building a Flask API to Automatically Extract Named Entities Using SpaCy
Data Science, Data Visualization, Generative Adversarial Network, Matplotlib, Optimization, Python, R, Seaborn
- The Mueller Report Word Cloud: A brief tutorial in R - Apr 22, 2019.
Word clouds are simple visual summaries of the mostly frequently used words in a text, presenting essentially the same information as a histogram but are somewhat less precise and vastly more eye-catching. Get a quick sense of the themes in the recently released Mueller Report and its 448 pages of legal content.
Donald Trump, Politics, R, Word Cloud
- Because analysis is more than just dashboards - Apr 11, 2019.
Where traditional BI tools often make it easy to build dashboards, Mode makes it easy for you to answer any follow-up questions when you see changes in those dashboards. Choose the level of abstraction you want for a given dataset and quickly get to the story behind the change.
Analysis, Dashboard, Data Visualization, Mode Analytics, Python, R, SQL
R vs Python for Data Visualization - Mar 25, 2019.
This article demonstrates creating similar plots in R and Python using two of the most prominent data visualization packages on the market, namely ggplot2 and Seaborn.
Data Visualization, ggplot2, Matplotlib, Python, Python vs R, R, Seaborn
- Top R Packages for Data Cleaning - Mar 15, 2019.
Data cleaning is one of the most important and time consuming task for data scientists. Here are the top R packages for data cleaning.
Data Cleaning, Data Preparation, Data Science, Machine Learning, R
- Advanced Analytics & Data Gov Training in Chicago: ML, DL, Self-Service, Strategy, and more - Mar 13, 2019.
Learn effective data governance practices and how to successfully implement advanced analytics by attending our industry leading training at TDWI Chicago, April 28 - May 3, and take your projects to the next level.
Advanced Analytics, Chicago, Data Governance, IL, R, Strategy, TDWI, Training

Who is a typical Data Scientist in 2019? - Mar 11, 2019.
We investigate what a typical data scientist looks like and see how this differs from this time last year, looking at skill set, programming languages, industry of employment, country of employment, and more.
Career, Data Science Skills, Data Scientist, Industry, MATLAB, Python, R, SQL
- Don’t do analysis in a vacuum - Feb 22, 2019.
Traditional tools force analysts to play the import-and-export game, so it's difficult to keep data fresh and accessible. Every Mode report or dashboard lives at a unique URL for future sharing, iterating, and building upon. Mode brings your entire team together in one platform.
Analytics, Dashboard, Mode Analytics, Platform, Python, R
Running R and Python in Jupyter - Feb 19, 2019.
The Jupyter Project began in 2014 for interactive and scientific computing. Fast forward 5 years and now Jupyter is one of the most widely adopted Data Science IDE's on the market and gives the user access to Python and R
IPython, Jupyter, Python, R
Understanding Gradient Boosting Machines - Feb 6, 2019.
However despite its massive popularity, many professionals still use this algorithm as a black box. As such, the purpose of this article is to lay an intuitive framework for this powerful machine learning technique.
Adaboost, Decision Trees, Gradient Boosting, R
- Using Caret in R to Classify Term Deposit Subscriptions for a Bank - Feb 4, 2019.
This article uses direct marketing campaign data from a Portuguese banking institution to predict if a customer will subscribe for a term deposit. We’ll be working with R’s Caret package to achieve this.
Banking, Classification, R
- Airbnb Rental Listings Dataset Mining - Jan 28, 2019.
An Exploratory Analysis of Airbnb’s Data to understand the rental landscape in New York City.
AirBnB, Data Exploration, Data Visualization, New York City, R, Real Estate
- 2018’s Top 7 R Packages for Data Science and AI - Jan 22, 2019.
This is a list of the best packages that changed our lives this year, compiled from my weekly digests.
Pages: 1 2
AI, Data Science, R
- Deep learning in Satellite imagery - Dec 26, 2018.
This article outlines possible sources of satellite imagery, what its properties are and how this data can be utilised using R.
Deep Learning, Image Recognition, R
- Exploring the Data Jungle Free eBook - Dec 18, 2018.
This free eBook by Brian Godsey will provide you with real-world examples in Python, R, and other languages suitable for data science.
Data Preparation, Data Science, Data Visualization, Free ebook, Manning, Python, R
- Automated Web Scraping in R - Dec 11, 2018.
How to automatically web scrape periodically so you can analyze timely/frequently updated data.
Data Science Dojo, R, Web Scraping
Data Science Projects Employers Want To See: How To Show A Business Impact - Dec 4, 2018.
The best way to create better data science projects that employers want to see is to provide a business impact. This article highlights the process using customer churn prediction in R as a case-study.
Career Advice, Churn, Data Preparation, Data Science, R
Best Machine Learning Languages, Data Visualization Tools, DL Frameworks, and Big Data Tools - Dec 3, 2018.
We cover a variety of topics, from machine learning to deep learning, from data visualization to data tools, with comments and explanations from experts in the relevant fields.
Big Data, Data Visualization, Deep Learning, Jupyter, Machine Learning, Python, R, Tableau
- SQL, Python, and R in One Platform - Nov 27, 2018.
Stop jumping between applications. Get a complete analytical toolkit.
Data Science Platform, Data Visualization, Mode Analytics, Python, R, SQL
SQL, Python, & R in One Platform - Oct 26, 2018.
No more jumping between applications. Mode Studio combines a SQL editor, Python and R notebooks, and a visualization builder in one platform.
Data Visualization, Mode Analytics, Python, R, SQL
Apache Spark Introduction for Beginners - Oct 18, 2018.
An extensive introduction to Apache Spark, including a look at the evolution of the product, use cases, architecture, ecosystem components, core concepts and more.
Apache Spark, Beginners, Hadoop, R
- SQL, Python, & R: All in One Platform - Oct 11, 2018.
Mode Studio connects a SQL editor, Python and R notebooks, and a visualization builder in one platform. Sign up now for access.
Data Visualization, Python, R, SQL
- Evaluating the Business Value of Predictive Models in Python and R - Oct 11, 2018.
In these blogs for R and python we explain four valuable evaluation plots to assess the business value of a predictive model. We show how you can easily create these plots and help you to explain your predictive model to non-techies.
Pages: 1 2
Business Value, Data Visualization, Lift charts, Predictive Models, Python, R
- KDnuggets™ News 18:n37, Oct 3: Mathematics of Machine Learning; Effective Transfer Learning for NLP; Path Analysis with R - Oct 3, 2018.
Also: Introducing VisualData: A Search Engine for Computer Vision Datasets; Raspberry Pi IoT Projects for Fun and Profit; Recent Advances for a Better Understanding of Deep Learning; Basic Image Data Analysis Using Python - Part 3; Introduction to Deep Learning
Computer Vision, Deep Learning, Machine Learning, Mathematics, NLP, R, Transfer Learning
- Introducing Path Analysis Using R - Sep 27, 2018.
Path analysis is an extension of multiple regression. It allows for the analysis of more complicated models.
Analysis, Analytics, R
- Optimization 101 for Data Scientists - Aug 8, 2018.
We show how to use optimization strategies to make the best possible decision.
Football, Julia, Optimization, Python, R, Sports
From Data to Viz: how to select the the right chart for your data - Aug 1, 2018.
We offer an interactive, decision tree-style tool, which examines the data you have and proposes a set of potentially appropriate visualizations to represent your dataset.
Data, Data Visualization, ggplot2, GitHub, R, Tidyverse
- Remote Data Science: How to Send R and Python Execution to SQL Server from Jupyter Notebooks - Jul 27, 2018.
Did you know that you can execute R and Python code remotely in SQL Server from Jupyter Notebooks or any IDE? Machine Learning Services in SQL Server eliminates the need to move data around.
Jupyter, Machine Learning, Microsoft, Python, R, SQL, SQL Server
Dimensionality Reduction : Does PCA really improve classification outcome? - Jul 13, 2018.
In this post, I am going to verify this statement using a Principal Component Analysis ( PCA ) to try to improve the classification performance of a neural network over a dataset.
Classification, Dimensionality Reduction, Machine Learning, PCA, R
5 of Our Favorite Free Visualization Tools - Jul 5, 2018.
5 key free data visualization tools that can provide flexible and effective data presentation.
Analytics, D3.js, Data Science, Data Visualization, Free Software, R, Tableau
- [ebook] Apache Spark™ Under the Hood - Jun 27, 2018.
Learn how to install and run Spark yourself; A summary of Spark core architecture and concepts; Spark powerful language APIs and how you can use them.
Apache Spark, Databricks, ebook, PyTorch, R, scikit-learn, TensorFlow
- KDnuggets™ News 18:n25, Jun 27: 5 Clustering Algorithms Data Scientists Need to Know; Detecting Sarcasm with Deep Convolutional Neural Networks? - Jun 27, 2018.
Also 30 Free Resources for Machine Learning, Deep Learning, NLP ; 7 Simple Data Visualizations You Should Know in R.
Clustering, Data Visualization, Machine Learning, Neural Networks, R
- Stagraph – a general purpose R GUI, for data import, wrangling, and visualization - Jun 25, 2018.
Stagraph is a new simple visual interface for R, which focuses on data import, data wrangling and data visualization.
Data Preparation, Data Visualization, R, Tidyverse
- How to Execute R and Python in SQL Server with Machine Learning Services - Jun 25, 2018.
Machine Learning Services in SQL Server eliminates the need for data movement - you can install and run R/Python packages to build Deep Learning and AI applications on data in SQL Server.
Azure ML, Machine Learning, Microsoft, Python, R, SQL, SQL Server
7 Simple Data Visualizations You Should Know in R - Jun 22, 2018.
This post presents a selection of 7 essential data visualizations, and how to recreate them using a mix of base R functions and a few common packages.
Charts, Data Visualization, Graphs, R
- KDnuggets™ News 18:n23, Jun 13: Did Python declare victory over R?; Master the Netflix Interview; Deep Learning Projects DIY Style - Jun 13, 2018.
Also: Command Line Tricks For Data Scientists; How (dis)similar are my train and test data?; 5 Machine Learning Projects You Should Not Overlook, June 2018; Introduction to Game Theory; Human Interpretable Machine Learning
Data Science, Deep Learning, Interview Questions, Machine Learning, Netflix, Python, R, Training Data
The 6 components of Open-Source Data Science/ Machine Learning Ecosystem; Did Python declare victory over R? - Jun 6, 2018.
We find 6 tools form the modern open source Data Science / Machine Learning ecosystem; examine whether Python declared victory over R; and review which tools are most associated with Deep Learning and Big Data.
Anaconda, Apache Spark, Data Science, Keras, Machine Learning, Open Source, Poll, Python, R, RapidMiner, Scala, scikit-learn, TensorFlow
- KDnuggets™ News 18:n22, Jun 6: 10 More Free Must-Read Books for Machine Learning and Data Science; Beginner Guide to Data Science Pipeline - Jun 6, 2018.
Summer. Time to sit back and unwind. Or get your hands on some free machine learning and data science books and learn! Here is a great selection to get started.
Data Science, Free ebook, Keras, Pipeline, R
- Using Linear Regression for Predictive Modeling in R - Jun 1, 2018.
In this post, we’ll use linear regression to build a model that predicts cherry tree volume from metrics that are much easier for folks who study trees to measure.
Pages: 1 2
Linear Regression, Predictive Modeling, R
- Virtual Training Events Without Leaving Your Desk - May 30, 2018.
Check out our lineup of upcoming virtual seminars, online learning courses, and customized training in your office. Space is limited, so reserve your seat early and score the best savings!
Agile, Business Analytics, Data Preparation, Data Science, Online Education, R, Visualization
Top 20 R Libraries for Data Science in 2018 - May 25, 2018.
We have prepared an infographic of Top 20 R packages for data science, which covers the libraries main features and GitHub activities, as all of the libraries are open-source.
Data Science, Infographic, R
- Modelling Time Series Processes using GARCH - May 25, 2018.
To go into the turbulent seas of volatile data and analyze it in a time changing setting, ARCH models were developed.
Pages: 1 2
Modeling, R, Time Series
- How to tackle common data cleaning issues in R - May 24, 2018.
R is a great choice for manipulating, cleaning, summarizing, producing probability statistics, and so on. In addition, it's not going away anytime soon, it is platform independent, so what you create will run almost anywhere, and it has awesome help resources.
Book, Data Cleaning, ebook, Packt Publishing, R
Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis - May 22, 2018.
Python continues to eat away at R, RapidMiner gains, SQL is steady, Tensorflow advances pulling along Keras, Hadoop drops, Data Science platforms consolidate, and more.
Pages: 1 2
Anaconda, Data Mining Software, Data Science Platform, Hadoop, Keras, Poll, Python, R, RapidMiner, SQL, TensorFlow, Trends
- Optimization Using R - May 18, 2018.
Optimization is a technique for finding out the best possible solution for a given problem for all the possible solutions. Optimization uses a rigorous mathematical model to find out the most efficient solution to the given problem.
Pages: 1 2
Excel, Linear Programming, Optimization, R
- R Fundamentals: Building a Simple Grade Calculator - Mar 19, 2018.
In this tutorial, we'll teach you the basics of R by building a simple grade calculator. While we do not assume any R-specific knowledge, you should be familiar with general programming concepts.
Pages: 1 2
Mathematics, Programming, R
- New Book: Credit risk analytics, The R Companion - Mar 16, 2018.
Credit risk analytics in R will enable you to build credit risk models from start to finish, with access to real credit data on accompanying website, you will master a wide range of applications.
Analytics, Bart Baesens, Credit Risk, R
- KDnuggets™ News 18:n11, Mar 14: Two sides of getting a job as a Data Scientist; 5 things to know about Machine Learning - Mar 14, 2018.
Also 18 Inspiring Women In AI, Big Data, Data Science, Machine Learning; Great Data Scientists Don't Just Think Outside the Box; Favorite Data Science / Machine Learning Blog; Text Processing in R.
Data Scientist, Machine Learning, R, Women
- Choropleth Maps in R - Mar 12, 2018.
Choropleth maps provides a very simple and easy way to understand visualizations of a measurement across different geographical areas, be it states or countries.
Pages: 1 2
Choropleth, Data Visualization, India, Maps, R
- Text Processing in R - Mar 9, 2018.
There are good reasons to want to use R for text processing, namely that we can do it, and that we can fit it in with the rest of our analyses. Furthermore, there is a lot of very active development going on in the R text analysis community right now.
Data Processing, R, Text Analytics, Text Mining
- TDWI Chicago, May 6-11: Get Your Hands Dirty With Data – KDnuggets Offer - Mar 2, 2018.
Attend the Hands-on Lab series and bring practical skills back from Chicago. Save 30% through March 16 with priority code KD30.
Chicago, Hadoop, IL, Machine Learning, Python, R, TDWI, Training
- Control Structures in R: Using If-Else Statements and Loops - Feb 23, 2018.
Control structures allow you to specify the execution of your code. They are extremely useful if you want to run a piece of code multiple times, or if you want to run a piece a code if a certain condition is met.
Decision Making, Programming Languages, R
- Building a Daily Bitcoin Price Tracker with Coindeskr and Shiny in R - Feb 7, 2018.
This tutorial is to help an R user build his/her own Daily Bitcoin Price Tracker using three packages, Coindeskr, Shiny and Dygraphs.
Bitcoin, Cryptocurrency, GitHub, R
- Data Science vs Addiction: Estimating Opioid Abuse by Location - Jan 26, 2018.
Data science can help find the optimal locations for drug treatment facilities, even in the face of major data challenges.
Alteryx, Healthcare, Location Analytics, Optimization, R
- Deep Learning in H2O using R - Jan 22, 2018.
This article is about implementing Deep Learning (DL) using the H2O package in R. We start with a background on DL, followed by some features of H2O's DL framework, followed by an implementation using R.
Backpropagation, Deep Learning, Gradient Descent, H2O, Machine Learning, R
- Propensity Score Matching in R - Jan 18, 2018.
Propensity scores are an alternative method to estimate the effect of receiving treatment when random assignment of treatments to subjects is not feasible.
Pages: 1 2
Bias, R, Statistics
- KDnuggets™ News 18:n03, Jan 17: Top 10 TED Talks on Data Science, Machine Learning; How Docker Can Help You Become A More Effective Data Scientist - Jan 17, 2018.
Also A Primer on Web Scraping in R; Elasticsearch for Dummies; Generative Adversarial Networks, an overview,
AI, Data Science, Docker, R, TED
- Topological Data Analysis for Data Professionals: Beyond Ayasdi - Jan 16, 2018.
We review recent developments and tools in topological data analysis, including applications of persistent homology to psychometrics and a recent extension of piecewise regression, called Morse-Smale regression.
Algorithms, Clustering, R, Regression, Topological Data Analysis
- A Primer on Web Scraping in R - Jan 12, 2018.
If you are a data scientist who wants to capture data from such web pages then you wouldn’t want to be the one to open all these pages manually and scrape the web pages one by one. To push away the boundaries limiting data scientists from accessing such data from web pages, there are packages available in R.
Pages: 1 2
Data Cleaning, Data Curation, R, Web Scraping
- 10 Tools to Help You Learn R - Jan 4, 2018.
There are several tools to help you grasp the foundational principles and more. The list below gives you an idea of what’s available and how much it costs.
R, Tools, Training
- Simple Ways Of Working With Medium To Big Data Locally - Dec 27, 2017.
An overview of the installation and implementation of simple techniques for working with large datasets in your machine.
Big Data, iPhone, Python, R, SAS
- How (& Why) Data Scientists and Data Engineers Should Share a Platform - Nov 17, 2017.
Sharing one platform has some obvious benefits for Data Science and Data Engineering teams, but technical, language and process challenges often make this a challenge. Learn how one company implemented single cloud platform for R, Python and other workloads – and some of the unexpected benefits they discovered along the way.
Apache Spark, Cazena, Data Science Platform, Hadoop, Python, R
- Extracting Tweets With R - Nov 14, 2017.
This article will give you a great, brief overview for extracting Tweets using R.
R, Twitter
- Tips for Getting Started with Text Mining in R and Python - Nov 8, 2017.
This article opens up the world of text mining in a simple and intuitive way and provides great tips to get started with text mining.
Python, R, Text Mining
- Process Mining with R: Introduction - Nov 2, 2017.
In the past years, several niche tools have appeared to mine organizational business processes. In this article, we’ll show you that it is possible to get started with “process mining” using well-known data science programming languages as well.
Pages: 1 2
Data Mining, Data Science, Process Mining, R
- KDnuggets™ News 17:n41, Oct 25: Learning git not enough to become data scientist; Peak Data Scientist Demand? Top Machine Learning w. R videos - Oct 25, 2017.
Becoming a data scientist after a science PhD; New Poll: When will demand for Data Scientists/Machine Learning experts peak? It Only Takes One Line of Code to Run Regression.
Apache Spark, Data Scientist, Machine Learning, R