This page features most recent and most popular posts on R.
- An Introduction to Statistical Learning: The Free eBook - Jun 29, 2020
This week's free eBook is a classic of data science, An Introduction to Statistical Learning, with Applications in R. If interested in picking up elementary statistical learning concepts, and learning how to implement them in R, this book is for you.
- Practical Markov Chain Monte Carlo - Jun 26, 2020
This is a slightly more intricate example of MCMC, compared to many with a fairly simple model, a single predictor (maybe two), and not much else, which highlights a couple of issues and tricks worth noting for a handwritten implementation.
- Data Science Tools Popularity, animated - Jun 25, 2020
Watch the evolution of the top 10 most popular data science tools based on KDnuggets software polls from 2000 to 2019.
- Build a Branded Web Based GIS Application Using R, Leaflet and Flexdashboard - Jun 24, 2020
By using R, Flexdashboard and Leaflet, we can build a customized and branded web application to showcase location based data interactively across the organization. Instead of crowding the application with many widgets, we use menu tabs and pages to separate the interactive aspects.
- modelStudio and The Grammar of Interactive Explanatory Model Analysis - Jun 19, 2020
modelStudio is an R package that automates the exploration of ML models and allows for interactive examination. It works in a model agnostic fashion, therefore is compatible with most of the ML frameworks.
- Python for data analysis… is it really that simple?!? [Silver Blog]
The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations.
- Time Series Classification Synthetic vs Real Financial Time Series [Silver Blog]
This article discusses distinguishing between real financial time series and synthetic time series using XGBoost.
- Python and R Courses for Data Science [Silver Blog]
Since Python and R are a must for today's data scientists, continuous learning is paramount. Online courses are arguably the best and most flexible way to upskill throughout ones career.
- Plotnine: Python Alternative to ggplot2 [Silver Blog]
Python's plotting libraries such as matplotlib and seaborn does allow the user to create elegant graphics as well, but lack of a standardized syntax for implementing the grammar of graphics compared to the simple, readable and layering approach of ggplot2 in R makes it more difficult to implement in Python.
- Data Science for Managers: Programming Languages [Silver Blog]
In this article, we are going to talk about popular languages for Data Science and briefly describe each of them.
- Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS [Gold Blog]
Data science jobs continue to grow in 2019, and this report shares the change and spread of jobs by software over recent years.
- What you need to know: The Modern Open-Source Data Science/Machine Learning Ecosystem [Silver Blog]
We identify the 6 tools in the modern open-source Data Science ecosystem, examine the Python vs R question, and determine which tools are used the most with Deep Learning and Big Data.
- Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis [Gold Blog]
Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues.
- How to correctly select a sample from a huge dataset in machine learning [Silver Blog]
We explain how choosing a small, representative dataset from a large population can improve model training reliability.
- R vs Python for Data Visualization [Gold Blog]
This article demonstrates creating similar plots in R and Python using two of the most prominent data visualization packages on the market, namely ggplot2 and Seaborn.
- Who is a typical Data Scientist in 2019? [Gold Blog]
We investigate what a typical data scientist looks like and see how this differs from this time last year, looking at skill set, programming languages, industry of employment, country of employment, and more.
- Running R and Python in Jupyter [Silver Blog]
The Jupyter Project began in 2014 for interactive and scientific computing. Fast forward 5 years and now Jupyter is one of the most widely adopted Data Science IDE's on the market and gives the user access to Python and R
- Understanding Gradient Boosting Machines [Silver Blog]
However despite its massive popularity, many professionals still use this algorithm as a black box. As such, the purpose of this article is to lay an intuitive framework for this powerful machine learning technique.
- Data Science Projects Employers Want To See: How To Show A Business Impact [Silver Blog]
The best way to create better data science projects that employers want to see is to provide a business impact. This article highlights the process using customer churn prediction in R as a case-study.
- Best Machine Learning Languages, Data Visualization Tools, DL Frameworks, and Big Data Tools [Silver Blog]
We cover a variety of topics, from machine learning to deep learning, from data visualization to data tools, with comments and explanations from experts in the relevant fields.
- SQL, Python, & R in One Platform [Silver Blog]
No more jumping between applications. Mode Studio combines a SQL editor, Python and R notebooks, and a visualization builder in one platform.
- Apache Spark Introduction for Beginners [Silver Blog]
An extensive introduction to Apache Spark, including a look at the evolution of the product, use cases, architecture, ecosystem components, core concepts and more.
- From Data to Viz: how to select the the right chart for your data [Silver Blog]
We offer an interactive, decision tree-style tool, which examines the data you have and proposes a set of potentially appropriate visualizations to represent your dataset.
- Dimensionality Reduction : Does PCA really improve classification outcome? [Gold Blog]
In this post, I am going to verify this statement using a Principal Component Analysis ( PCA ) to try to improve the classification performance of a neural network over a dataset.
- 5 of Our Favorite Free Visualization Tools [Gold Blog]
5 key free data visualization tools that can provide flexible and effective data presentation.
- 7 Simple Data Visualizations You Should Know in R [Silver Blog]
This post presents a selection of 7 essential data visualizations, and how to recreate them using a mix of base R functions and a few common packages.