What you need to know: The Modern Open-Source Data Science/Machine Learning Ecosystem - Jun 10, 2019.
We identify the 6 tools in the modern open-source Data Science ecosystem, examine the Python vs R question, and determine which tools are used the most with Deep Learning and Big Data.
Anaconda, Apache Spark, Big Data Software, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, Tableau, TensorFlow
- A Step-by-Step Guide to Transitioning your Career to Data Science – Part 2 - Jun 7, 2019.
How do you identify the technical skills a hiring manager is looking for? How do you build a data science project that draws the attention of a hiring manager?
Career Advice, Data Science, Skills, SQL, Tableau
- Why physical storage of your database tables might matter - May 31, 2019.
Follow this investigation into why physical storage of your database tables might matter, from problem identification to possible issue resolutions.
Apache Spark, Databases, Postgres, SQL

Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis - May 30, 2019.
Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues.
Pages: 1 2
Anaconda, Apache Spark, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, TensorFlow

7 Steps to Mastering SQL for Data Science — 2019 Edition - May 17, 2019.
Follow these updated 7 steps to go from SQL data science newbie to practitioner in a hurry. We consider only the necessary concepts and skills, and provide quality resources for each.
7 Steps, Data Science, Database, Relational Databases, SQL
- Training a Champion: Building Deep Neural Nets for Big Data Analytics - Apr 4, 2019.
Introducing Sisense Hunch, the new way of handling Big Data sets that uses AQP technology to construct Deep Neural Networks (DNNs) which are trained to learn the relationships between queries and their results in these huge datasets.
Big Data Analytics, Deep Learning, Neural Networks, Sisense, SQL

Who is a typical Data Scientist in 2019? - Mar 11, 2019.
We investigate what a typical data scientist looks like and see how this differs from this time last year, looking at skill set, programming languages, industry of employment, country of employment, and more.
Career, Data Science Skills, Data Scientist, Industry, MATLAB, Python, R, SQL
SQL, Python, & R in One Platform - Oct 26, 2018.
No more jumping between applications. Mode Studio combines a SQL editor, Python and R notebooks, and a visualization builder in one platform.
Data Visualization, Mode Analytics, Python, R, SQL
- SQL Case Study: Helping a Startup CEO Manage His Data - Sep 19, 2018.
In this tutorial, you will learn how to create a table, insert values into it, use and understand some data types, use SELECT statements, UPDATE records, use some aggregate functions, and more.
Pages: 1 2
SQL, Startup
- OLAP queries in SQL: A Refresher - Sep 3, 2018.
Based on the recent book - Principles of Database Management - The Practical Guide to Storing, Managing and Analyzing Big and Small Data - this post examines how OLAP queries can be implemented in SQL.
Bart Baesens, OLAP, SQL
- Remote Data Science: How to Send R and Python Execution to SQL Server from Jupyter Notebooks - Jul 27, 2018.
Did you know that you can execute R and Python code remotely in SQL Server from Jupyter Notebooks or any IDE? Machine Learning Services in SQL Server eliminates the need to move data around.
Jupyter, Machine Learning, Microsoft, Python, R, SQL, SQL Server
SQL Cheat Sheet - Jul 2, 2018.
A good programmer or software developer should have a basic knowledge of SQL queries in order to be able retrieve data from a database. This cheat sheet can help you get started in your learning, or provide a useful resource for those working with SQL.
Cheat Sheet, SQL
- Modern Graph Query Language – GSQL - Jun 29, 2018.
This post introduces the prospect of fulfilling the need for a modern graph query language with GSQL
Graph Analytics, Graph Databases, SQL, TigerGraph
- How to Execute R and Python in SQL Server with Machine Learning Services - Jun 25, 2018.
Machine Learning Services in SQL Server eliminates the need for data movement - you can install and run R/Python packages to build Deep Learning and AI applications on data in SQL Server.
Azure ML, Machine Learning, Microsoft, Python, R, SQL, SQL Server
- Simple Tips for PostgreSQL Query Optimization - Jun 22, 2018.
A single query optimization tip can boost your database performance by 100x. Although we usually advise our customers to use these tips to optimize analytic queries (such as aggregation ones), this post is still very helpful for any other type of query.
Optimization, Postgres, SQL, Statsbot
- Event Processing: Three Important Open Problems - May 28, 2018.
This article summarizes the three most important problems to be solved in event processing. The facts in this article are supported by a recent survey and an analysis conducted on the industry trends.
Big Data, Data Analytics, Insights, Real-time, SQL, Streaming Analytics
Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis - May 22, 2018.
Python continues to eat away at R, RapidMiner gains, SQL is steady, Tensorflow advances pulling along Keras, Hadoop drops, Data Science platforms consolidate, and more.
Pages: 1 2
Anaconda, Data Mining Software, Data Science Platform, Hadoop, Keras, Poll, Python, R, RapidMiner, SQL, TensorFlow, Trends
- YouTube videos on database management, SQL, Datawarehousing, Business Intelligence, OLAP, Big Data, NoSQL databases, data quality, data governance and Analytics – free - May 18, 2018.
Watch over 20 hours of YouTube videos on databases and database design, Physical Data Storage, Transaction Management and Database Access, and Data Warehousing, Data Governance and (Big) Data Analytics - all free.
Analytics, Bart Baesens, Big Data, Business Intelligence, Data Governance, Data Quality, Data Warehousing, Databases, NoSQL, SQL, Youtube
- Presto for Data Scientists – SQL on anything - Apr 19, 2018.
Presto enables data scientists to run interactive SQL across multiple data sources. This open source engine supports querying anything, anywhere, and at large scale.
Big Data, Database, Presto, SQL
- Scalable Select of Random Rows in SQL - Apr 5, 2018.
Performance boosts are achieved by selecting random rows or the sampling technique. Let’s learn how to select random rows in SQL.
Sampling, SQL, Statsbot
- A Beginner’s Guide to Data Engineering – Part II - Mar 15, 2018.
In this post, I share more technical details on how to build good data pipelines and highlight ETL best practices. Primarily, I will use Python, Airflow, and SQL for our discussion.
Pages: 1 2
AirBnB, Data Engineering, Data Science, ETL, Pipeline, Python, SQL
- Calculating Customer Lifetime Value: SQL Example - Feb 15, 2018.
In order to understand how to estimate LTV, it is useful to first think about evaluating a customer’s lifetime value at the end of their relationship with us.
Customer Analytics, Lifetime Value, SQL, Statsbot
- A Guide for Customer Retention Analysis with SQL - Dec 19, 2017.
Customer retention curves are essential to any business looking to understand its clients, and will go a long way towards explaining other things like sales figures or the impact of marketing initiatives. They are an easy way to visualize a key interaction between customers and the business.
Pages: 1 2
Analytics, Customer Analytics, SQL, Statsbot
- PySpark SQL Cheat Sheet: Big Data in Python - Nov 16, 2017.
PySpark is a Spark Python API that exposes the Spark programming model to Python - With it, you can speed up analytic applications. With Spark, you can get started with big data processing, as it has built-in modules for streaming, SQL, machine learning and graph processing.
Pages: 1 2
Apache Spark, Big Data, DataCamp, Python, SQL
30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets - Sep 22, 2017.
This collection of data science cheat sheets is not a cheat sheet dump, but a curated list of reference materials spanning a number of disciplines and tools.
Pages: 1 2 3
Cheat Sheet, Data Science, Deep Learning, Machine Learning, Neural Networks, Probability, Python, R, SQL, Statistics
42 Steps to Mastering Data Science - Aug 25, 2017.
This post is a collection of 6 separate posts of 7 steps a piece, each for mastering and better understanding a particular data science topic, with topics ranging from data preparation, to machine learning, to SQL databases, to NoSQL and beyond.
Data Preparation, Data Science, Deep Learning, Machine Learning, NoSQL, Python, SQL
- How To Write Better SQL Queries: The Definitive Guide – Part 2 - Aug 24, 2017.
Most forget that SQL isn’t just about writing queries, which is just the first step down the road. Ensuring that queries are performant or that they fit the context that you’re working in is a whole other thing. This SQL tutorial will provide you with a small peek at some steps that you can go through to evaluate your query.
Pages: 1 2
Algorithms, Complexity, Databases, Relational Databases, SQL
- How To Write Better SQL Queries: The Definitive Guide – Part 1 - Aug 23, 2017.
Most forget that SQL isn’t just about writing queries, which is just the first step down the road. Ensuring that queries are performant or that they fit the context that you’re working in is a whole other thing. This SQL tutorial will provide you with a small peek at some steps that you can go through to evaluate your query.
Pages: 1 2
Databases, Relational Databases, SQL
The Rise of GPU Databases - Aug 17, 2017.
The recent but noticeable shift from CPUs to GPUs is mainly due to the unique benefits they bring to sectors like AdTech, finance, telco, retail, or security/IT . We examine where GPU databases shine.
Big Data, Database, GPU, Predictive Analytics, SQL, SQream
- Data Science for Newbies: An Introductory Tutorial Series for Software Engineers - May 31, 2017.
This post summarizes and links to the individual tutorials which make up this introductory look at data science for newbies, mainly focusing on the tools, with a practical bent, written by a software engineer from the perspective of a software engineering approach.
Apache Spark, Data Science, Jupyter, Machine Learning, Pandas, Python, Reddit, Scala, SQL
- How to think like a data scientist to become one - Mar 23, 2017.
The author went from securities analyst to Head of Data Science at Amazon. He describes what he learned in his journey and gives 4 useful rules based on his experience.
Amazon, Data Science Skills, Data Scientist, SQL, Statistics
- The Most Underutilized Function in SQL - Mar 20, 2017.
Find out why md5() is an SQL function that's used surprisingly often, and find out how -- and why -- you can use it yourself.
Data Science, SQL
- Making Python Speak SQL with pandasql - Feb 8, 2017.
Want to wrangle Pandas data like you would SQL using Python? This post serves as an introduction to pandasql, and details how to get it up and running inside of Rodeo.
Pandas, Python, SQL, Yhat
A Funny Look at Big Data and Data Science - Dec 27, 2016.
A less than serious look at Big Data and Data Science. If you can laugh at all cartoons, then your Data Science skills are in good shape.
Big Data, Cartoon, Humor, SQL
- Evaluating HTAP Databases for Machine Learning Applications - Nov 2, 2016.
Businesses are producing a greater number of intelligent applications; which traditional databases are unable to support. A new class of databases, Hybrid Transactional and Analytical Processing (HTAP) databases, offers a variety of capabilities with specific strengths and weaknesses to consider. This article aims to give application developers and data scientists a better understanding of the HTAP database ecosystem so they can make the right choice for their intelligent application.
Pages: 1 2
Big Data, Data Processing, HTAP, Oracle, SAP, Splice Machine, SQL
- Doing Statistics with SQL - Aug 2, 2016.
This post covers how to perform some basic in-database statistical analysis using SQL.
SQL, Statistics
- 7 Steps to Mastering SQL for Data Science - Jun 16, 2016.
Follow these 7 steps to go from SQL data science newbie to seasoned practitioner quickly. No nonsense, just the necessities.
Pages: 1 2
7 Steps, Data Science, Database, Relational Databases, SQL
- R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results - Jun 6, 2016.
R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R. RapidMiner remains the most popular general Data Science platform. Big Data tools used by almost 40%, and Deep Learning usage doubles.
Pages: 1 2
Data Mining Software, Data Science Platform, Poll, Python, Python vs R, R, RapidMiner, SQL
- Practical skills that practical data scientists need - May 13, 2016.
The long story short, data scientist needs to be capable of solving business analytics problems. Learn more about the skill-set you need to master to achieve so.
Business Context, Data Scientist, Mathematics, Skills, SQL
- Fastest Growing Programming Languages and Computing Frameworks - Mar 7, 2016.
A new model for ranking programming languages and predicting the growth of user adoption. Includes current language rankings and predictions.
Data Science, Javascript, Programming Languages, SQL, Trends
- Data Science Skills for 2016 - Feb 12, 2016.
As demand for the hottest job is getting hotter in new year, the skill set required for them is getting larger. Here, we are discussing the skills which will be in high demand for data scientist which include data visualization, Apache Spark, R, python and many more.
Apache Spark, CrowdFlower, Data Science, Python, Skills, SQL
- Spark SQL for Real-Time Analytics - Sep 4, 2015.
Apache Spark is the hottest topic in Big Data. This tutorial discusses why Spark SQL is becoming the preferred method for Real Time Analytics and for next frontier, IoT (Internet of Things).
Ajit Jaokar, Apache Spark, Real-time, SQL, Sumit Pal
- 60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning, Python, R, and more - Sep 4, 2015.
Here is a great collection of eBooks written on the topics of Data Science, Business Analytics, Data Mining, Big Data, Machine Learning, Algorithms, Data Science Tools, and Programming Languages for Data Science.
Book, Brendan Martin, Data Mining, Data Science, Free ebook, Machine Learning, Python, R, SQL
- How to become a Data Scientist for Free - Aug 28, 2015.
Here are the most required skills for a data scientist position based on ReSkill’s analyses of thousands of job posts and free resources to learn each skill.
Data Science Education, Data Scientist, Java, Online Education, Python, R, SQL, Statistics
- Emacs for Data Science - Jul 10, 2015.
Data science nowadays demands a polyglot developer and, choosing a correct code editor would definitely be a worthy investment. Here we provide, important features of Emacs and its advantages over other editors.
Data Science Tools, Emacs, R, SQL
- Which Big Data, Data Mining, and Data Science Tools go together? - Jun 11, 2015.
We analyze the associations between the top Big Data, Data Mining, and Data Science tools based on the results of 2015 KDnuggets Software Poll. Download anonymized data and analyze it yourself.
Apache Spark, Data Mining Software, Excel, Hadoop, Knime, Poll, Python, R, RapidMiner, SQL
- R leads RapidMiner, Python catches up, Big Data tools grow, Spark ignites - May 25, 2015.
R is the most popular overall tool among data miners, although Python usage is growing faster. RapidMiner continues to be most popular suite for data mining/data science. Hadoop/Big Data tools usage grew to 29%, propelled by 3x growth in Spark. Other tools with strong growth include H2O (0xdata), Actian, MLlib, and Alteryx.
Actian, Apache Spark, Data Mining Software, H2O, Knime, Poll, Python, R, RapidMiner, SQL
- SQL-like Query Language for Real-time Streaming Analytics - Mar 12, 2015.
We need SQL like query language for Realtime Streaming Analytics to be expressive, short, fast, define core operations that cover 90% of problems, and to be easy to follow and learn.
Real-time, Realtime Analytics, SQL, Stream Mining, Streaming Analytics
- Most Demanded Data Science and Data Mining Skills - Dec 15, 2014.
Our analysis of most demanded data scientist skills shows that Data Science is a team effort focused on business analytics, with top 5 platform skills being SQL, Python, R, SAS, and Hadoop.
Data Science Skills, Data Scientist, Hadoop, New York-NY, Python, R, SAS, Skills, SQL
- SlamData Open Source Analytics Tool for MongoDB - Dec 4, 2014.
SlamData is an open source SQL-based tool designed to make accessing data in MongoDB easy for developers and non-developers alike with the goal of making application intelligence easier.
MongoDB, NoSQL, Open Source, SlamData, SQL
- Four main languages for Analytics, Data Mining, Data Science - Aug 18, 2014.
New KDnuggets Poll shows the growing dominance of four main languages for Analytics, Data Mining, and Data Science: R, SAS, Python, and SQL - used by 91% of data scientists - and decline in popularity of other languages, except for Julia and Scala.
Analytics Languages, Data Mining, Data Science, Julia, Poll, Python, R, SAS, Scala, SQL
- KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll: RapidMiner Continues To Lead - Jun 7, 2014.
With over 3,000 data miners taking part in KDnuggets 15th Annual Software Poll, RapidMiner continues to lead. Free software is used much more outside US, and Hadoop usage grows fastest in Asia.
Data Mining Software, Excel, Hadoop, Knime, Poll, Python, R, RapidMiner, SAS, SQL, SQL Server, Weka
- Guide to Data Science Cheat Sheets - May 12, 2014.
Selection of the most useful Data Science cheat sheets, covering SQL, Python (including NumPy, SciPy and Pandas), R (including Regression, Time Series, Data Mining), MATLAB, and more.
Cheat Sheet, Data Science, Python, R, SQL