- Data Observability, Part II: How to Build Your Own Data Quality Monitors Using SQL - Feb 23, 2021.
Using schema and lineage to understand the root cause of your data anomalies.
Tags: Data Engineering, Data Quality, Data Science, Data Science Platform, SQL
- Data Observability: Building Data Quality Monitors Using SQL - Feb 16, 2021.
To trigger an alert when data breaks, data teams can leverage a tried and true tactic from our friends in software engineering: monitoring and observability. In this article, we walk through how you can create your own data quality monitors for freshness and distribution from scratch using SQL.
Tags: Data Engineering, Data Quality, Data Science, Data Science Platform, SQL
- 7 Most Recommended Skills to Learn to be a Data Scientist - Feb 10, 2021.
The Data Scientist professional has emerged as a true interdisciplinary role that spans a variety of skills, theoretical and practical. For the core, day-to-day activities, many critical requirements that enable the delivery of real business value reach well outside the realm of machine learning, and should be mastered by those aspiring to the field.
Tags: Career Advice, Data Science Skills, Data Scientist, Data Visualization, Docker, Pandas, Python, SQL
- How to Deploy a Flask API in Kubernetes and Connect it with Other Micro-services - Feb 9, 2021.
A hands-on tutorial on how to implement your micro-service architecture using the powerful container orchestration tool Kubernetes.
Tags: API, Containers, Flask, Kubernetes, MySQL, Python, SQL
- Data Cleaning and Wrangling in SQL - Jan 14, 2021.
SQL is a foundational skill for data analysts but its application is sometimes limited within the data pipeline. However, SQL can be successfully used for many pre-processing tasks, such as data cleaning and wrangling, as demonstrated here by example.
Tags: Data Cleaning, Data Preparation, SQL
- Advice to aspiring Data Scientists – your most common questions answered - Jan 7, 2021.
Embarking on a new career path can be daunting with many unknowns about how to get started and how to be successful. If you are aspiring to become a Data Scientist, then the answers to these common questions can help set you off on the right foot.
Tags: Advice, Career Advice, Data Scientist, Mathematics, Online Education, SQL
- KDnuggets™ News 21:n01, Jan 6: All machine learning algorithms you should know in 2021; Monte Carlo integration in Python; MuZero – the most important ML system ever created? - Jan 6, 2021.
The first issue in 2021 brings you a great blog about Monte Carlo Integration - in Python; An overview of main Machine Learning algorithms you need to know in 2021; SQL vs NoSQL: 7 Key Takeaways; Generating Beautiful Neural Network Visualizations - how to; MuZero - may be the most important Machine Learning system ever created; and much more!
Tags: Algorithms, Monte Carlo, MuZero, NoSQL, Python, SQL
SQL vs NoSQL: 7 Key Takeaways - Dec 23, 2020.
People assume that NoSQL is a counterpart to SQL. Instead, it’s a different type of database designed for use-cases where SQL is not ideal. The differences between the two are many, although some are so crucial that they define both databases at their cores.
Tags: Databases, NoSQL, Programming, SQL
- KDnuggets™ News 20:n48, Dec 23: Crack SQL Interviews; MLOps – Why and How; 2021 AI, Data Science, ML Predictions - Dec 23, 2020.
In this last issue of the year learn how to crack SQL interviews, find why and how of MLOps, check top online courses Data Science, and read the predictions for AI, Data Science, and Machine Learning from our panel of experts and a group of innovative companies.
Tags: 2021 Predictions, Courses, MLOps, SQL
Crack SQL Interviews - Dec 17, 2020.
SQL is an essential programming language for data analysis and processing. So, SQL questions are always part of the interview process for data science-related jobs, including data analysts, data scientists, and data engineers. Become familiar with these common patterns seen in SQL interview questions and follow our tips on how to neatly handle each with SQL queries.
Tags: Interview Questions, SQL
- 6 Things About Data Science that Employers Don’t Want You to Know - Dec 14, 2020.
As is the potential for any "trending hot" career, the reality of a position in the field may not be all that you initially expected. Data Science is no exception, and being still a young field, its evolving definition can offer some surprises that you should know about before accepting that dream offer.
Tags: Business, Career Advice, Communication, Data Science, Data Scientist, SQL
- The Ultimate Guide to Data Engineer Interviews - Dec 7, 2020.
If you are preparing for data engineering interviews, then follow these technical recommendations regarding your resume, programming skills, SQL acumen, and system design problem-solving, as well as the non-technical aspects of your upcoming interview session.
Tags: Career Advice, Data Engineer, Data Engineering, Interview Questions, Programming, SQL
- Top 6 Data Science Programs for Beginners - Nov 20, 2020.
Udacity has the best industry-leading programs in data science. Here are the top six data science courses for beginners to help you get started.
Tags: Beginners, Certificate, Data Engineer, Data Science Education, Data Visualization, Online Education, Python, R, SQL, Udacity
- Top KDnuggets tweets, Nov 11-17: Data Engineering – the Cousin of Data Science, is Troublesome - Nov 18, 2020.
Also 6 Things About #DataScience that Employers Don't Want You to Know; NLP - Zero to Hero with #Python #NLProc; 5 Tricky SQL Queries Solved - Explaining the approach to solving a few complex #SQL queries.
Tags: Career Advice, Data Engineering, Data Science, NLP, SQL, Top tweets
- 5 Tricky SQL Queries Solved - Nov 12, 2020.
Explaining the approach to solving a few complex SQL queries.
Tags: Data Science, SQL, Use Cases
Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills - Sep 8, 2020.
We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.
Tags: Communication, Data Preparation, Data Science Skills, Data Visualization, Excel, GitHub, Mathematics, Poll, Python, Reinforcement Learning, scikit-learn, SQL, Statistics
- Working with Spark, Python or SQL on Azure Databricks - Aug 27, 2020.
Here we look at some ways to interchangeably work with Python, PySpark and SQL using Azure Databricks, an Apache Spark-based big data analytics service designed for data science and data engineering offered by Microsoft.
Tags: Apache Spark, Databricks, Microsoft Azure, Python, SQL
- Data Science Tools Illustrated Study Guides - Aug 25, 2020.
These data science tools illustrated guides are broken up into four distinct categories: data retrieval, data manipulation, data visualization, and engineering tips. Both online and PDF versions of these guides are available.
Tags: Cheat Sheet, Data Preprocessing, Data Processing, Data Science, Data Science Tools, Data Visualization, Python, R, SQL
Feature Engineering in SQL and Python: A Hybrid Approach - Jul 2, 2020.
Set up your workstation, reduce workplace clutter, maintain a clean namespace, and effortlessly keep your dataset up-to-date.
Tags: Feature Engineering, Python, SQL
- Top KDnuggets tweets, May 13-19: Linear algebra and optimization and machine learning: A textbook - May 21, 2020.
Also: Everything you need to become a self-taught #MachineLearning Engineer ; SQL Cheat Sheet (2020) - a useful cheat sheet that documents some of the more commonly used elements of SQL;
Tags: AutoML, Cheat Sheet, Linear Algebra, Machine Learning Engineer, SQL, Top tweets
- What they do not tell you about machine learning - May 19, 2020.
There's a lot of excitement out there about machine learning jobs. So, it's always good to start off with a healthy dose of reality and proper expectations.
Tags: Advice, Career, Machine Learning, Machine Learning Engineer, SQL
- The Benefits & Examples of Using Apache Spark with PySpark - Apr 21, 2020.
Apache Spark runs fast, offers robust, distributed, fault-tolerant data objects, and integrates beautifully with the world of machine learning and graph analytics. Learn more here.
Tags: Apache Spark, Data Management, Python, SQL
Python for data analysis… is it really that simple?!? - Apr 2, 2020.
The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations.
Tags: Data Analysis, Pandas, Python, R, SQL
- Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - Feb 14, 2020.
When reviewing geographical data, it can be difficult to prepare the data for an analysis. This article helps by covering importing data into a SQL Server database; cleansing and grouping data into a map grid; adding time data points to the set of grid data and filling in the gaps where no crimes occurred; importing the data into R; running XGBoost model to determine where crimes will occur on a specific day
Tags: Crime, Geospatial, R, SQL, Tableau, Time Series
- KDnuggets™ News 20:n02, Jan 15: Top 5 Must-have Data Science Skills; Learn Machine Learning with THIS Book - Jan 15, 2020.
This week: learn the 5 must-have data science skills for the new year; find out which book is THE book to get started learning machine learning; pick up some Python tips and tricks; learn SQL, but learn it the hard way; and find an introductory guide to learning common NLP techniques.
Tags: Books, Data Science, Data Science Skills, Machine Learning, NLP, Programming, Python, SQL, Tips
Learning SQL the Hard Way - Jan 8, 2020.
Simply put: This post is about installing SQL, explaining SQL and running SQL.
Tags: Databases, MySQL, Programming, SQL
7 Resources to Becoming a Data Engineer - Jan 7, 2020.
An estimated 8,650% growth of the volume of Data to 175 zetabytes from 2010 to 2025 has created an enormous need for Data Engineers to build an organization's big data platform to be fast, efficient and scalable.
Tags: Advice, Big Data, Cloud Computing, Data Engineering, Data Science, MOOC, SQL
- KDnuggets™ News 19:n38, Oct 9: The Last SQL Guide for Data Analysis; 4 Quadrants of Data Science Skills and 7 steps for Viral Data Visualization - Oct 9, 2019.
Read a comprehensive SQL guide for data analysis; Learn how to choose the right clustering algorithm for your data; Find out how to create a viral DataViz using the data from Data Science Skills poll; Enroll in any of 10 Free Top Notch Natural Language Processing Courses; and more.
Tags: Clustering, Data Visualization, Machine Learning Engineer, SQL
The Last SQL Guide for Data Analysis You’ll Ever Need - Oct 4, 2019.
This is it: the last SQL guide for data analysis you'll ever need! OK, maybe it’s actually the first. But it’ll give you a solid head start.
Tags: Cheat Sheet, Data Analysis, Data Science, SQL
- KDnuggets™ News 19:n32, Aug 28: Handy SQL Features for Data Scientists; Nothing but NumPy: Creating Neural Networks with Computational Graphs - Aug 28, 2019.
Most useful SQL features for Data Scientist; Excellent tutorial on creating neural nets from scratch with Numpy; TensorFlow 2.0 highlights, explained; How to sell your boss on Data Analytics; and more.
Tags: Neural Networks, numpy, SQL, TensorFlow
Top Handy SQL Features for Data Scientists - Aug 23, 2019.
Whenever we hear "data," the first thing that comes to mind is SQL! SQL comes with easy and quick to learn features to organize and retrieve data, as well as perform actions on it in order to gain useful insights.
Tags: Data Science, Data Scientist, SQL
- Is SQL needed to be a data scientist? - Jul 25, 2019.
As long as there is ‘data’ in data scientist, Structured Query Language (or see-quel as we call it) will remain an important part of it. In this blog, let us explore data science and its relationship with SQL.
Tags: Data Science, Relational Databases, SQL
- Become a Pro at Pandas, Python’s Data Manipulation Library - Jun 13, 2019.
Pandas is one of the most popular Python libraries for cleaning, transforming, manipulating and analyzing data. Learn how to efficiently handle large amounts of data using Pandas.
Tags: Matplotlib, numpy, Pandas, Python, SQL
What you need to know: The Modern Open-Source Data Science/Machine Learning Ecosystem - Jun 10, 2019.
We identify the 6 tools in the modern open-source Data Science ecosystem, examine the Python vs R question, and determine which tools are used the most with Deep Learning and Big Data.
Tags: Anaconda, Apache Spark, Big Data Software, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, Tableau, TensorFlow
- A Step-by-Step Guide to Transitioning your Career to Data Science – Part 2 - Jun 7, 2019.
How do you identify the technical skills a hiring manager is looking for? How do you build a data science project that draws the attention of a hiring manager?
Tags: Career Advice, Data Science, Skills, SQL, Tableau
- Why physical storage of your database tables might matter - May 31, 2019.
Follow this investigation into why physical storage of your database tables might matter, from problem identification to possible issue resolutions.
Tags: Apache Spark, Databases, Postgres, SQL

Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis - May 30, 2019.
Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues.
Pages: 1 2
Tags: Anaconda, Apache Spark, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, TensorFlow
- KDnuggets™ News 19:n20, May 22: 7 Steps to Mastering SQL for Data Science; How to build Math Programming Skills - May 22, 2019.
Also An overview of Pycharm for Data Scientists; How to build a Computer Vision model - key approaches and datasets; k-means clustering tutorial; 60+ useful graph visualization libraries; The Data Fabric for Machine Learning.
Tags: Computer Vision, K-means, Mathematics, PyCharm, SQL

7 Steps to Mastering SQL for Data Science — 2019 Edition - May 17, 2019.
Follow these updated 7 steps to go from SQL data science newbie to practitioner in a hurry. We consider only the necessary concepts and skills, and provide quality resources for each.
Tags: 7 Steps, Data Science, Database, Relational Databases, SQL
- Powerful like your local notebook. Sharable like a Google Doc. - Apr 30, 2019.
Mode is the only analytics platform with native Python and R Notebooks. Get everyone up and running in minutes by delivering Notebook-powered results right in your browser. Now anyone on your team can re-run R- and Python-powered reports themselves—without ever touching code.
Tags: Mode Analytics, Python, R, SQL
- Because analysis is more than just dashboards - Apr 11, 2019.
Where traditional BI tools often make it easy to build dashboards, Mode makes it easy for you to answer any follow-up questions when you see changes in those dashboards. Choose the level of abstraction you want for a given dataset and quickly get to the story behind the change.
Tags: Analysis, Dashboard, Data Visualization, Mode Analytics, Python, R, SQL
- Training a Champion: Building Deep Neural Nets for Big Data Analytics - Apr 4, 2019.
Introducing Sisense Hunch, the new way of handling Big Data sets that uses AQP technology to construct Deep Neural Networks (DNNs) which are trained to learn the relationships between queries and their results in these huge datasets.
Tags: Big Data Analytics, Deep Learning, Neural Networks, Sisense, SQL

Who is a typical Data Scientist in 2019? - Mar 11, 2019.
We investigate what a typical data scientist looks like and see how this differs from this time last year, looking at skill set, programming languages, industry of employment, country of employment, and more.
Tags: Career, Data Science Skills, Data Scientist, Industry, MATLAB, Python, R, SQL
- SQL, Python, and R in One Platform - Nov 27, 2018.
Stop jumping between applications. Get a complete analytical toolkit.
Tags: Data Science Platform, Data Visualization, Mode Analytics, Python, R, SQL
- UnitedHealth Group: Clinical Data Statistical Analyst – SQL SAS (Clinician Required) [Telecommute] - Nov 16, 2018.
Leverage your data analytic and project management skills to lead programs that focus on improving HEDIS rates and impacting the quality of care for our members.
Tags: Analyst, Healthcare, SAS, SQL, Telecommute, UnitedHealth Group
SQL, Python, & R in One Platform - Oct 26, 2018.
No more jumping between applications. Mode Studio combines a SQL editor, Python and R notebooks, and a visualization builder in one platform.
Tags: Data Visualization, Mode Analytics, Python, R, SQL
- SQL, Python, & R: All in One Platform - Oct 11, 2018.
Mode Studio connects a SQL editor, Python and R notebooks, and a visualization builder in one platform. Sign up now for access.
Tags: Data Visualization, Python, R, SQL
- KDnuggets™ News 18:n36, Sep 26: Machine Learning Algorithms From Scratch; Deep Learning Framework Popularity; Data Capture, the Deep Learning Way - Sep 26, 2018.
Also: SQL Case Study: Helping a Startup CEO Manage His Data; Building a Machine Learning Model through Trial and Error; The Whys and Hows of Web Scraping; Unfolding Naive Bayes From Scratch; "Auto-What?" - A Taxonomy of Automated Machine Learning
Tags: Algorithms, Automated Machine Learning, Deep Learning, Machine Learning, Perceptron, SQL, Web Scraping
- SQL Case Study: Helping a Startup CEO Manage His Data - Sep 19, 2018.
In this tutorial, you will learn how to create a table, insert values into it, use and understand some data types, use SELECT statements, UPDATE records, use some aggregate functions, and more.
Pages: 1 2
Tags: SQL, Startup
- KDnuggets™ News 18:n33, Sep 5: Practical Topic Modeling with Python; Classifying AI Technologies; Data Science Project Inspiration - Sep 5, 2018.
Also: An End-to-End Project on Time Series Analysis and Forecasting with Python; Financial Data Analysis - Data Processing 1: Loan Eligibility Prediction; OLAP queries in SQL: A Refresher; Word Vectors in Natural Language Processing: Global Vectors (GloVe)
Tags: AI, Data Science, Finance, OLAP, Python, SQL, Time Series, Topic Modeling, Word Embeddings
- OLAP queries in SQL: A Refresher - Sep 3, 2018.
Based on the recent book - Principles of Database Management - The Practical Guide to Storing, Managing and Analyzing Big and Small Data - this post examines how OLAP queries can be implemented in SQL.
Tags: Bart Baesens, OLAP, SQL
- KDnuggets™ News 18:n29, Aug 1: Building an Awesome Data Science Portfolio; Data Science + DevOps = Taming the Unicorn - Aug 1, 2018.
Also: A Practitioner's Guide to Processing & Understanding Text: Data Retrieval with Web Scraping; Remote Data Science: How to Send R and Python Execution to SQL Server from Jupyter Notebooks; Best Deal in the Galaxy? Win KDnuggets Free Pass to Strata Data Conference NYC
Tags: Data Science, Data Scientist, DevOps, Jupyter, Portfolio, SQL, Unicorn, Web Scraping
- Remote Data Science: How to Send R and Python Execution to SQL Server from Jupyter Notebooks - Jul 27, 2018.
Did you know that you can execute R and Python code remotely in SQL Server from Jupyter Notebooks or any IDE? Machine Learning Services in SQL Server eliminates the need to move data around.
Tags: Jupyter, Machine Learning, Microsoft, Python, R, SQL, SQL Server
- KDnuggets™ News 18:n26, Jul 11: 5 Favorite Free Visualization Tools; SQL Cheat Sheet; Top 20 Python Libraries for Data Science - Jul 11, 2018.
Also Introduction to Apache Spark; fast.ai Machine Learning Course Notes; Cartoon: How is Data Science Different From Religion?
Tags: Cheat Sheet, Data Visualization, Python, SQL
SQL Cheat Sheet - Jul 2, 2018.
A good programmer or software developer should have a basic knowledge of SQL queries in order to be able retrieve data from a database. This cheat sheet can help you get started in your learning, or provide a useful resource for those working with SQL.
Tags: Cheat Sheet, SQL
- Modern Graph Query Language – GSQL - Jun 29, 2018.
This post introduces the prospect of fulfilling the need for a modern graph query language with GSQL
Tags: Graph Analytics, Graph Databases, SQL, TigerGraph
- How to Execute R and Python in SQL Server with Machine Learning Services - Jun 25, 2018.
Machine Learning Services in SQL Server eliminates the need for data movement - you can install and run R/Python packages to build Deep Learning and AI applications on data in SQL Server.
Tags: Azure ML, Machine Learning, Microsoft, Python, R, SQL, SQL Server
- Simple Tips for PostgreSQL Query Optimization - Jun 22, 2018.
A single query optimization tip can boost your database performance by 100x. Although we usually advise our customers to use these tips to optimize analytic queries (such as aggregation ones), this post is still very helpful for any other type of query.
Tags: Optimization, Postgres, SQL, Statsbot
- Event Processing: Three Important Open Problems - May 28, 2018.
This article summarizes the three most important problems to be solved in event processing. The facts in this article are supported by a recent survey and an analysis conducted on the industry trends.
Tags: Big Data, Data Analytics, Insights, Real-time, SQL, Streaming Analytics
- Deep Learning With Apache Spark: Part 2 - May 23, 2018.
In this article I’ll continue the discussion on Deep Learning with Apache Spark. I will focus entirely on the DL pipelines library and how to use it from scratch.
Pages: 1 2
Tags: Apache Spark, Deep Learning, Keras, SQL, TensorFlow
Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis - May 22, 2018.
Python continues to eat away at R, RapidMiner gains, SQL is steady, Tensorflow advances pulling along Keras, Hadoop drops, Data Science platforms consolidate, and more.
Pages: 1 2
Tags: Anaconda, Data Mining Software, Data Science Platform, Hadoop, Keras, Poll, Python, R, RapidMiner, SQL, TensorFlow, Trends
- YouTube videos on database management, SQL, Datawarehousing, Business Intelligence, OLAP, Big Data, NoSQL databases, data quality, data governance and Analytics – free - May 18, 2018.
Watch over 20 hours of YouTube videos on databases and database design, Physical Data Storage, Transaction Management and Database Access, and Data Warehousing, Data Governance and (Big) Data Analytics - all free.
Tags: Analytics, Bart Baesens, Big Data, Business Intelligence, Data Governance, Data Quality, Data Warehousing, Databases, NoSQL, SQL, Youtube
- To SQL or not To SQL: that is the question! - May 7, 2018.
This article looks at the emergence of the NoSQL movement and compares it to a traditional relational database.
Tags: Databases, NoSQL, Relational Databases, Scalability, SQL
- KDnuggets™ News 18:n17, Apr 25: Python Regular Expressions Cheat Sheet; Deep Learning With Apache Spark; Building a Question Answering Model - Apr 25, 2018.
Also: Derivation of Convolutional Neural Network from Fully Connected Network Step-By-Step; Presto for Data Scientists - SQL on anything; Why Deep Learning is perfect for NLP (Natural Language Processing); Top 16 Open Source Deep Learning Libraries and Platforms
Tags: Apache Spark, Cheat Sheet, Deep Learning, NLP, Python, Question answering, SQL
- Presto for Data Scientists – SQL on anything - Apr 19, 2018.
Presto enables data scientists to run interactive SQL across multiple data sources. This open source engine supports querying anything, anywhere, and at large scale.
Tags: Big Data, Database, Presto, SQL
- Loading Terabytes of Data from Postgres into BigQuery - Apr 9, 2018.
Despite the fact that an ETL task is pretty challenging when it comes to loading Big Data, there’s still the scenario in which you can load terabytes of data from Postgres into BigQuery relatively easy and very efficiently.
Tags: BigQuery, ETL, NoSQL, Postgres, SQL, Statsbot
- Scalable Select of Random Rows in SQL - Apr 5, 2018.
Performance boosts are achieved by selecting random rows or the sampling technique. Let’s learn how to select random rows in SQL.
Tags: Sampling, SQL, Statsbot
- A Beginner’s Guide to Data Engineering – Part II - Mar 15, 2018.
In this post, I share more technical details on how to build good data pipelines and highlight ETL best practices. Primarily, I will use Python, Airflow, and SQL for our discussion.
Pages: 1 2
Tags: AirBnB, Data Engineering, Data Science, ETL, Pipeline, Python, SQL
Want a Job in Data? Learn This - Feb 19, 2018.
Why mastering a 50-year-old programming language is the key to getting a data science job.
Tags: Advice, Career, Data Science, SQL
- Calculating Customer Lifetime Value: SQL Example - Feb 15, 2018.
In order to understand how to estimate LTV, it is useful to first think about evaluating a customer’s lifetime value at the end of their relationship with us.
Tags: Customer Analytics, Lifetime Value, SQL, Statsbot
- SQL Window Functions Tutorial for Business Analysis - Dec 27, 2017.
In this SQL window functions tutorial, we will describe how these functions work in general, what is behind their syntax, and show how to answer these questions with pure SQL.
Pages: 1 2
Tags: Analytics, Business Analytics, SQL, Statsbot
- A Guide for Customer Retention Analysis with SQL - Dec 19, 2017.
Customer retention curves are essential to any business looking to understand its clients, and will go a long way towards explaining other things like sales figures or the impact of marketing initiatives. They are an easy way to visualize a key interaction between customers and the business.
Pages: 1 2
Tags: Analytics, Customer Analytics, SQL, Statsbot
- Unlock Machine Learning for the New Speed and Scale of Business - Dec 8, 2017.
Learn how Vertica in-database machine learning supports the entire predictive analytics process with, with MPP, SQL execution, R, Python, Java and more - get the whitepaper.
Tags: Big Data, Database, Machine Learning, MPP Database, SQL, Vertica, White Paper
- Database Bootcamp Webinar Series, Dec 5, 7, 12, 14 - Dec 1, 2017.
The need to be broadly knowledgeable and rapidly understand the existing database ecosystem is growing. Looker broken down and simplified the differentiators of the main database technologies into this series of four, 45-minute webinar sessions.
Tags: Databases, Looker, MPP Database, SQL
- PySpark SQL Cheat Sheet: Big Data in Python - Nov 16, 2017.
PySpark is a Spark Python API that exposes the Spark programming model to Python - With it, you can speed up analytic applications. With Spark, you can get started with big data processing, as it has built-in modules for streaming, SQL, machine learning and graph processing.
Pages: 1 2
Tags: Apache Spark, Big Data, DataCamp, Python, SQL
- Spark – The Definitive Guide – exclusive preview - Sep 25, 2017.
Get an exclusive preview of "Spark: The Definitive Guide" from Databricks! Learn how Spark runs on a cluster, see examples in SQL, Python and Scala, Learn about Structured Streaming and Machine Learning and more.
Tags: Apache Spark, Databricks, Free ebook, Python, Scala, SQL
30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets - Sep 22, 2017.
This collection of data science cheat sheets is not a cheat sheet dump, but a curated list of reference materials spanning a number of disciplines and tools.
Pages: 1 2 3
Tags: Cheat Sheet, Data Science, Deep Learning, Machine Learning, Neural Networks, Probability, Python, R, SQL, Statistics
42 Steps to Mastering Data Science - Aug 25, 2017.
This post is a collection of 6 separate posts of 7 steps a piece, each for mastering and better understanding a particular data science topic, with topics ranging from data preparation, to machine learning, to SQL databases, to NoSQL and beyond.
Tags: Data Preparation, Data Science, Deep Learning, Machine Learning, NoSQL, Python, SQL
- How To Write Better SQL Queries: The Definitive Guide – Part 2 - Aug 24, 2017.
Most forget that SQL isn’t just about writing queries, which is just the first step down the road. Ensuring that queries are performant or that they fit the context that you’re working in is a whole other thing. This SQL tutorial will provide you with a small peek at some steps that you can go through to evaluate your query.
Pages: 1 2
Tags: Algorithms, Complexity, Databases, Relational Databases, SQL
- How To Write Better SQL Queries: The Definitive Guide – Part 1 - Aug 23, 2017.
Most forget that SQL isn’t just about writing queries, which is just the first step down the road. Ensuring that queries are performant or that they fit the context that you’re working in is a whole other thing. This SQL tutorial will provide you with a small peek at some steps that you can go through to evaluate your query.
Pages: 1 2
Tags: Databases, Relational Databases, SQL
The Rise of GPU Databases - Aug 17, 2017.
The recent but noticeable shift from CPUs to GPUs is mainly due to the unique benefits they bring to sectors like AdTech, finance, telco, retail, or security/IT . We examine where GPU databases shine.
Tags: Big Data, Database, GPU, Predictive Analytics, SQL, SQream
- Populating a GRAKN.AI Knowledge Graph with the World - Jul 20, 2017.
This updated article describes how to move SQL data into a GRAKN.AI knowledge graph.
Tags: GRAKN.AI, Graph, Knowledge Graph, SQL
- Data Science for Newbies: An Introductory Tutorial Series for Software Engineers - May 31, 2017.
This post summarizes and links to the individual tutorials which make up this introductory look at data science for newbies, mainly focusing on the tools, with a practical bent, written by a software engineer from the perspective of a software engineering approach.
Tags: Apache Spark, Data Science, Jupyter, Machine Learning, Pandas, Python, Reddit, Scala, SQL
- How to think like a data scientist to become one - Mar 23, 2017.
The author went from securities analyst to Head of Data Science at Amazon. He describes what he learned in his journey and gives 4 useful rules based on his experience.
Tags: Amazon, Data Science Skills, Data Scientist, SQL, Statistics
- KDnuggets™ News 17:n11, Mar 22: 50 Companies Leading The AI Revolution; 17 More Must-Know Data Science Q&A, part 3 - Mar 22, 2017.
Also 7 Types of Data Scientist Job Profiles; Email Spam Filtering: An Implementation with Python and Scikit-learn.
Tags: AI, Data Scientist, Interview Questions, SQL, Startups
- The Most Underutilized Function in SQL - Mar 20, 2017.
Find out why md5() is an SQL function that's used surprisingly often, and find out how -- and why -- you can use it yourself.
Tags: Data Science, SQL
- Grunion, Query Optimization Tool for Data Science and Big Data - Mar 14, 2017.
Grunion is a patent-pending query optimization, translation, and federation framework built to help bridge the gap between data science and data engineering teams. Read more to request access.
Tags: Apache Spark, Benchmark, Data Workflow, Datascience.com, NoSQL, SQL
- KDnuggets™ News 17:n06, Feb 15: So What is Big Data? 52 Useful Machine Learning APIs; Data Science finds Perfect Valentines Dates - Feb 15, 2017.
Also Making Python Speak SQL with pandasql; 52 Useful Machine Learning & Prediction APIs, updated; New Poll: Do you support Trump Immigration Ban?
Tags: API, Big Data, Clustering, Data Science Platform, Machine Learning, Python, SQL
- Making Python Speak SQL with pandasql - Feb 8, 2017.
Want to wrangle Pandas data like you would SQL using Python? This post serves as an introduction to pandasql, and details how to get it up and running inside of Rodeo.
Tags: Pandas, Python, SQL, Yhat
A Funny Look at Big Data and Data Science - Dec 27, 2016.
A less than serious look at Big Data and Data Science. If you can laugh at all cartoons, then your Data Science skills are in good shape.
Tags: Big Data, Cartoon, Humor, SQL
- How to Make Your Database 200x Faster Without Having to Pay More - Nov 22, 2016.
Waiting long for a BI query to execute? I know it’s annoyingly frustrating… It’s a major bottle neck in day-to-day life of a Data Analyst or BI expert. Let’s learn some of the easy to use solutions and a very good explanation of why to use them, along with other advanced technological solutions.
Pages: 1 2 3
Tags: BI, Databases, OLTP, Optimization, Performance, Sampling, SnappyData, SQL
- Evaluating HTAP Databases for Machine Learning Applications - Nov 2, 2016.
Businesses are producing a greater number of intelligent applications; which traditional databases are unable to support. A new class of databases, Hybrid Transactional and Analytical Processing (HTAP) databases, offers a variety of capabilities with specific strengths and weaknesses to consider. This article aims to give application developers and data scientists a better understanding of the HTAP database ecosystem so they can make the right choice for their intelligent application.
Pages: 1 2
Tags: Big Data, Data Processing, HTAP, Oracle, SAP, Splice Machine, SQL
- Top KDnuggets tweets, Sep 28-Oct 4: 7 Steps to Mastering SQL for #DataScience; Biggest Issues in #DataScience - Oct 5, 2016.
7 Steps to Mastering SQL for #DataScience; New Andrew Ng #MachineLearning #Book Under Construction, #Free Draft Chapters; Top #DataScientist Claudia Perlich on Biggest Issues in #DataScience; Awesome Public Datasets on GitHub
Tags: Andrew Ng, Data Science, ebook, SQL, Top tweets
- O’Reilly Live Training–Real-time. Real experts. Real learning. - Sep 26, 2016.
Get intensive, hands-on training from O'Reilly's expert network on critical data topics - from SQL fundamentals to distributed computing; enterprise strategy to data science at scale.
Tags: Apache Spark, Courses, Distributed Systems, Hadoop, O'Reilly, scikit-learn, SQL
- Doing Statistics with SQL - Aug 2, 2016.
This post covers how to perform some basic in-database statistical analysis using SQL.
Tags: SQL, Statistics
- Database Key Terms, Explained - Jul 28, 2016.
Interested in a survey of important database concepts and terminology? This post defines 16 essential database key terms concisely and accurately.
Pages: 1 2
Tags: Databases, Explained, Graph Databases, Key Terms, NoSQL, RDBMS, Relational Databases, SQL
- 5 Big Data Projects You Can No Longer Overlook - Jul 21, 2016.
Check out 5 Big Data projects that you are not likely to have seen before, but which may be useful to you, and perhaps even scratch an itch you didn't know you had.
Tags: Big Data, Cloud Computing, Google, Hadoop, Javascript, Overlook, Presto, Spotify, SQL
- KDnuggets™ News 16:n22, Jun 22: Data Science Blog Contest; Free Machine Learning Ebook; Master SQL for Data Science - Jun 22, 2016.
Data Science Blog Contest; New Free Andrew Ng Machine Learning Book Under Construction; 7 Steps to Mastering SQL for Data Science; A Visual Explanation of the Back Propagation Algorithm; Mining Twitter Data with Python Part 1: Collecting Data
Tags: Backpropagation, Data Science, Free ebook, Neural Networks, SQL
- 7 Steps to Mastering SQL for Data Science - Jun 16, 2016.
Follow these 7 steps to go from SQL data science newbie to seasoned practitioner quickly. No nonsense, just the necessities.
Pages: 1 2
Tags: 7 Steps, Data Science, Database, Relational Databases, SQL
- Morpace: SQL Programmer - Jun 10, 2016.
Seeking an SQL Programmer to design, implement and maintain a relational database and reporting system. Will collaborate with other programmers and cross-functional teams to assist in designing and advancing the system in an agile environment.
Tags: Developer, Farmington Hills, MI, Morpace, SQL
- R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results - Jun 6, 2016.
R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R. RapidMiner remains the most popular general Data Science platform. Big Data tools used by almost 40%, and Deep Learning usage doubles.
Pages: 1 2
Tags: Data Mining Software, Data Science Platform, Poll, Python, Python vs R, R, RapidMiner, SQL
- Spark 2.0 Preview Now on Databricks Community Edition: Easier, Faster, Smarter - May 17, 2016.
The preview of Spark 2.0 is here, and it promises to be easier, faster, and smarter.
Tags: Apache Spark, Databricks, SQL
- Practical skills that practical data scientists need - May 13, 2016.
The long story short, data scientist needs to be capable of solving business analytics problems. Learn more about the skill-set you need to master to achieve so.
Tags: Business Context, Data Scientist, Mathematics, Skills, SQL
- The MBA Data Science Toolkit: 8 resources to go from the spreadsheet to the command line - Apr 18, 2016.
A great guide for the MBA, or any relatively non-technical convert, for getting comfortable with the command line and other technical skills required to excel in data science.
Pages: 1 2
Tags: GitHub, Haskell, Machine Learning, Python, R, SQL
- Fastest Growing Programming Languages and Computing Frameworks - Mar 7, 2016.
A new model for ranking programming languages and predicting the growth of user adoption. Includes current language rankings and predictions.
Tags: Data Science, Javascript, Programming Languages, SQL, Trends
- Webinar: Driving Data Democracy: Hadoop and Redshift, Mar 16 - Mar 4, 2016.
The Hadoop ecosystem has improved markedly over the past few years. MPP databases allow analytics teams to easily query massive structured data sets. Learn how these pipelines work on March 16.
Tags: Amazon Redshift, Hadoop, Looker, MPP Database, SQL
- Data Science Skills for 2016 - Feb 12, 2016.
As demand for the hottest job is getting hotter in new year, the skill set required for them is getting larger. Here, we are discussing the skills which will be in high demand for data scientist which include data visualization, Apache Spark, R, python and many more.
Tags: Apache Spark, CrowdFlower, Data Science, Python, Skills, SQL
- Will Balkanization of Data Science lead to one Empire or many Republics? - Nov 30, 2015.
We examine the “Technoslavia” of the Big Data and Data Science market and consider whether it is likely to lead to a unified empire or a federation of independent republics.
Tags: Big Data Market, Data Science, Dataiku, SQL
- Top KDnuggets tweets, Oct 27 – Nov 02: A Framework for Distributed Deep Learning Layer Design in Python - Nov 3, 2015.
A Framework for Distributed #DeepLearning Layer Design in Python; SQL vs. NoSQL- What You Need to Know; Great Tutorial: A Neural Network in 11 lines of #Python; Data Scientist - 2nd Best IT and Engineering Job.
Tags: Deep Learning, NoSQL, Python, Salary, SQL
- Spark + SETI: Amping up Spark SQL with Parquets - Oct 21, 2015.
Spark SQL is a great component for data scientists as it simplifies the querying large distributed datasets. Learn how to integrate it with Parquets, which we have found to significantly improve the performance of sparse-column queries.
Tags: Apache Spark, IBM, Parquets, Python, SETI, Spark SQL, SQL
- Easier Data Prep and Analysis for Data Scientists, Oct 20 Webinar - Oct 6, 2015.
Rapid Insight will show tools that make the data preparation and analysis process significantly faster, without losing the flexibility of advanced programming or SQL tools.
Tags: Data Preparation, RapidInsight, SQL
- Dataiku Data Science Studio, now also runs on Apache Spark - Sep 29, 2015.
Dataiku Data Science Studio version 2.1 has many useful features for Data Scientists, including integration with Apache Spark.
Pages: 1 2
Tags: Apache Spark, Data Science Platform, Dataiku, R, Spark SQL, SQL
- Spark SQL for Real Time Analytics – Part Two - Sep 22, 2015.
Apache Spark is the hottest topic in Big Data. Part 2 of this covers basic concepts of Stream Processing for Real Time Analytics and for the next frontier – Internet of Things (IoT).
Pages: 1 2
Tags: Ajit Jaokar, Apache Spark, Real-time, SQL, Stream Processing, Streaming Analytics, Sumit Pal
- Data Science for Internet of Things – practitioner course - Sep 14, 2015.
Created by Data Science and IoT professionals, the course covers infrastructure (Hadoop – Spark), Programming / Modelling(R/Time series) and ioT. Course starts Nov 2015, delivered online, and will have limited participants.
Tags: Apache Spark, Data Science, IoT, R, Scala, SQL, Sumit Pal
- Upcoming Webcasts on Analytics, Big Data, Data Science – Sep 8 and beyond - Sep 7, 2015.
The Future of Data Science, Ensuring Business Value from Analytics, Apache Ignite, Text Analytics, Best Practices of Data Science, Forecasting With Predictive Analytics, and more.
Tags: Business Value, Forecasting, Hadoop, IIA, In-Memory Computing, SQL, Text Analytics
- Spark SQL for Real-Time Analytics - Sep 4, 2015.
Apache Spark is the hottest topic in Big Data. This tutorial discusses why Spark SQL is becoming the preferred method for Real Time Analytics and for next frontier, IoT (Internet of Things).
Tags: Ajit Jaokar, Apache Spark, Real-time, SQL, Sumit Pal
- 60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning, Python, R, and more - Sep 4, 2015.
Here is a great collection of eBooks written on the topics of Data Science, Business Analytics, Data Mining, Big Data, Machine Learning, Algorithms, Data Science Tools, and Programming Languages for Data Science.
Tags: Book, Brendan Martin, Data Mining, Data Science, Free ebook, Machine Learning, Python, R, SQL
- How to become a Data Scientist for Free - Aug 28, 2015.
Here are the most required skills for a data scientist position based on ReSkill’s analyses of thousands of job posts and free resources to learn each skill.
Tags: Data Science Education, Data Scientist, Java, Online Education, Python, R, SQL, Statistics
- A Beginner’s Guide to SQL - Aug 27, 2015.
SQL is one of the core skills of a data engineer and data scientist. This mini-tutorial explains the four fundamental SQL functions: Create, Read, Update, and Delete using a fun example of movie quotes database.
Pages: 1 2 3
Tags: Data Processing, SQL, Udemy
- Apache Drill Makes Big Data Analysis Easier for Everyone - Aug 18, 2015.
Apache Drill is an open source query engine that provides interactive and secure SQL analytics at the scale of petabytes. Provides data querying and exploring capabilities from varied NoSQL databases and file formats.
Tags: Apache Drill, Kaushik Pal, SQL
- To Code or Not to Code with KNIME - Jul 22, 2015.
Find out how KNIME allows us to integrating analytical languages, such as R and Python and visual design of SQL code. Also, learn to integrate your Hadoop, visualization and ETL systems with the KNIME.
Pages: 1 2
Tags: Hadoop, Javascript, Knime, Michael Berthold, Python, R, SQL
- Emacs for Data Science - Jul 10, 2015.
Data science nowadays demands a polyglot developer and, choosing a correct code editor would definitely be a worthy investment. Here we provide, important features of Emacs and its advantages over other editors.
Tags: Data Science Tools, Emacs, R, SQL
- Which Big Data, Data Mining, and Data Science Tools go together? - Jun 11, 2015.
We analyze the associations between the top Big Data, Data Mining, and Data Science tools based on the results of 2015 KDnuggets Software Poll. Download anonymized data and analyze it yourself.
Tags: Apache Spark, Data Mining Software, Excel, Hadoop, Knime, Poll, Python, R, RapidMiner, SQL
- R leads RapidMiner, Python catches up, Big Data tools grow, Spark ignites - May 25, 2015.
R is the most popular overall tool among data miners, although Python usage is growing faster. RapidMiner continues to be most popular suite for data mining/data science. Hadoop/Big Data tools usage grew to 29%, propelled by 3x growth in Spark. Other tools with strong growth include H2O (0xdata), Actian, MLlib, and Alteryx.
Tags: Actian, Apache Spark, Data Mining Software, H2O, Knime, Poll, Python, R, RapidMiner, SQL
- Top KDnuggets tweets, Apr 14-20: Modern Methods for Sentiment Analysis; Basics of SQL, RDBMS – must have skills - Apr 21, 2015.
Great overview: Modern Methods for Sentiment Analysis #word2vec; Basics of SQL and RDBMS - must have skills for data science; The 7 Most Unusual Applications of Big Data; Extensive, but a little confusing site: Understanding Data Visualization.
Tags: About Gregory Piatetsky, Data Visualization, Sentiment Analysis, SQL, word2vec
- KDnuggets™ News 15:n09, Mar 25: Deep Learning from Scratch; 10 steps to Kaggle Success; US CDS DJ Patil Cartoon - Mar 25, 2015.
Deep Learning for Text Understanding from Scratch; New Poll: Computing platform; 10 Steps to Success in Kaggle Data; Cartoon: US Chief Data Scientist Most Difficult Challenge; SQL-like Query Language for Real-time Streaming Analytics.
Tags: Deep Learning, Kaggle, SQL, Streaming Analytics, UK, Yann LeCun
- Interview: Dave McCrory, Basho on Distributed Database Needs of a Future Enterprise - Mar 16, 2015.
We discuss the future of distributed storage for enterprise, Scale-up vs. Scale-out, software design patterns in Cloud era, microservices model and the place for legacy database in modern enterprise IT.
Tags: Basho, Cloud Computing, Databases, Dave McCrory, Distributed Systems, Integration, Interview, SQL
- SQL-like Query Language for Real-time Streaming Analytics - Mar 12, 2015.
We need SQL like query language for Realtime Streaming Analytics to be expressive, short, fast, define core operations that cover 90% of problems, and to be easy to follow and learn.
Tags: Real-time, Realtime Analytics, SQL, Stream Mining, Streaming Analytics
- Upcoming Webcasts on Analytics, Big Data, Data Science – Mar 10 and beyond - Mar 9, 2015.
Data Wrangling and the Art of Big Data Discovery, Data Mining: Failure to Launch, The State of Hadoop Adoption, Addressing the Challenges of Data Variety, and more.
Tags: Data Visualization, Data Wrangling, Hadoop, Kafka, Security, SQL
- Top KDnuggets tweets, Feb 23-25: Microsoft is building fast, low-power Deep Learning networks; Lucrative tech careers: Data Scientist, Data Engineer - Feb 26, 2015.
5 lucrative tech careers in 2015: Data Scientist ($150K), Data Engineer ($148K); Which SQL on Hadoop? Gartner Poll Still Says "Whatever" But DBMS Providers Gain; 10 Most-Funded #BigData #Startups; DataRPM 8 runs in #Hadoop, uses #MachineLearning to find insights.
Tags: Big Data, Data Engineer, Data Scientist, DataRPM, Hadoop, Salary, SQL, Startups, Trevor Hastie