- A step-by-step guide for creating an authentic data science portfolio project - Oct 7, 2020.
Especially if you are starting out launching yourself as a Data Scientist, you will want to first demonstrate your skills through interesting data science project ideas that you can implement and share. This step-by-step guide shows you how to do go through this process, with an original example that explores Germany’s biggest frequent flyer forum, Vielfliegertreff.
- Feature Engineering for Numerical Data - Sep 11, 2020.
Data feeds machine learning models, and the more the better, right? Well, sometimes numerical data isn't quite right for ingestion, so a variety of methods, detailed in this article, are available to transform raw numbers into something a bit more palatable.
- Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills - Sep 8, 2020.
We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.
- What Is Data Enrichment And How It Works - Sep 2, 2020.
Learn what is data enrichment, what are the different types, benefits and use cases for data enrichment, and how Smartproxy helps you do it.
- Getting Started with Feature Selection - Aug 25, 2020.
For machine learning, more data is always better. What about more features of data? Not necessarily. This beginners' guide with code examples for selecting the most useful features from your data will jump start you toward developing the most effective and efficient learning models.
- These Data Science Skills will be your Superpower - Aug 20, 2020.
Learning data science means learning the hard skills of statistics, programming, and machine learning. To complete your training, a broader set of soft skills will round out your capabilities as an effective and successful professional Data Scientist.
- 5 Different Ways to Load Data in Python - Aug 13, 2020.
Data is the bread and butter of a Data Scientist, so knowing many approaches to loading data for analysis is crucial. Here, five Python techniques to bring in your data are reviewed with code examples for you to follow.
- The Machine Learning Field Guide - Aug 3, 2020.
This straightforward guide offers a structured overview of all machine learning prerequisites needed to start working on your project, including the complete data pipeline from importing and cleaning data to modelling and production.
- First Steps of a Data Science Project - Jul 29, 2020.
Many data science projects are launched with good intentions, but fail to deliver because the correct process is not understood. To achieve good performance and results in this work, the first steps must include clearly defining goals and outcomes, collecting data, and preparing and exploring the data. This is all about solving problems, which requires a systematic process.
- Easy Guide To Data Preprocessing In Python - Jul 24, 2020.
Preprocessing data for machine learning models is a core general skill for any Data Scientist or Machine Learning Engineer. Follow this guide using Pandas and Scikit-learn to improve your techniques and make sure your data leads to the best possible outcome.
- Exploratory Data Analysis on Steroids - Jul 6, 2020.
This is a central aspect of Data Science, which sometimes gets overlooked. The first step of anything you do should be to know your data: understand it, get familiar with it. This concept gets even more important as you increase your data volume: imagine trying to parse through thousands or millions of registers and make sense out of them.
- Data Cleaning: The secret ingredient to the success of any Data Science Project - Jul 1, 2020.
With an uncleaned dataset, no matter what type of algorithm you try, you will never get accurate results. That is why data scientists spend a considerable amount of time on data cleaning.
- How to Prepare Your Data - Jun 30, 2020.
This is an overview of structuring, cleaning, and enriching raw data.
- How to Deal with Missing Values in Your Dataset - Jun 22, 2020.
In this article, we are going to talk about how to identify and treat the missing values in the data step by step.
- 5 Essential Papers on AI Training Data - Jun 4, 2020.
Data pre-processing is not only the largest time sink for most Data Scientists, but it is also the most crucial aspect of the work. Learn more about training data and data processing tasks from 5 leading academic papers.
- Appropriately Handling Missing Values for Statistical Modelling and Prediction - May 22, 2020.
Many statisticians in industry agree that blindly imputing the missing values in your dataset is a dangerous move and should be avoided without first understanding why the data is missing in the first place.
- Data Transformation: Standardization vs Normalization - Apr 23, 2020.
Increasing accuracy in your models is often obtained through the first steps of data transformations. This guide explains the difference between the key feature scaling methods of standardization and normalization, and demonstrates when and how to apply each approach.
- A Layman’s Guide to Data Science. Part 2: How to Build a Data Project - Apr 2, 2020.
As Part 2 in a Guide to Data Science, we outline the steps to build your first Data Science project, including how to ask good questions to understand the data first, how to prepare the data, how to develop an MVP, reiterate to build a good product, and, finally, present your project.
- Diffusion Map for Manifold Learning, Theory and Implementation - Mar 25, 2020.
This article aims to introduce one of the manifold learning techniques called Diffusion Map. This technique enables us to understand the underlying geometric structure of high dimensional data as well as to reduce the dimensions, if required, by neatly capturing the non-linear relationships between the original dimensions.
- Python Pandas For Data Discovery in 7 Simple Steps - Mar 10, 2020.
Just getting started with Python's Pandas library for data analysis? Or, ready for a quick refresher? These 7 steps will help you become familiar with its core features so you can begin exploring your data in no time.
- Achieving Accuracy with your Training Dataset - Mar 5, 2020.
How do we make sure our training data is more accurate than the rest? Partners like Supahands eliminate the headache that comes with labeling work by providing end-to-end managed labeling solutions, completed by a fully managed workforce that is trained to work on your model specifics.
- Hand labeling is the past. The future is #NoLabel AI - Feb 19, 2020.
Data labeling is so hot right now… but could this rapidly emerging market face disruption from a small team at Stanford and the Snorkel open source project, which enables highly efficient programmatic labeling that is 10 to 1,000x as efficient as hand labeling?
- An Introductory Guide to NLP for Data Scientists with 7 Common Techniques - Jan 9, 2020.
Data Scientists work with tons of data, and many times that data includes natural language text. This guide reviews 7 common techniques with code examples to introduce you the essentials of NLP, so you can begin performing analysis and building models from textual data.
- Microsoft Introduces Icebreaker to Address the Famous Ice-Start Challenge in Machine Learning - Dec 16, 2019.
The new technique allows the deployment of machine learning models that operate with minimum training data.
- Build Pipelines with Pandas Using pdpipe - Dec 13, 2019.
We show how to build intuitive and useful pipelines with Pandas DataFrame using a wonderful little library called pdpipe.
- 5 Great New Features in Latest Scikit-learn Release - Dec 10, 2019.
From not sweating missing values, to determining feature importance for any estimator, to support for stacking, and a new plotting API, here are 5 new features of the latest release of Scikit-learn which deserve your attention.
- The Essential Toolbox for Data Cleaning - Dec 5, 2019.
Increase your confidence to perform data cleaning with a broader perspective of what datasets typically look like, and follow this toolbox of code snipets to make your data cleaning process faster and more efficient.
- The Rise of User-Generated Data Labeling - Dec 4, 2019.
Let’s say your project is humongous and needs data labeling to be done continuously - while you’re on-the-go, sleeping, or eating. I’m sure you’d appreciate User-generated Data Labeling. I’ve got 6 interesting examples to help you understand this, let’s dive right in!
- Three Methods of Data Pre-Processing for Text Classification - Nov 21, 2019.
This blog shows how text data representations can be used to build a classifier to predict a developer’s deep learning framework of choice based on the code that they wrote, via examples of TensorFlow and PyTorch projects.
- Pro Tips: How to deal with Class Imbalance and Missing Labels - Nov 20, 2019.
Your spectacularly-performing machine learning model could be subject to the common culprits of class imbalance and missing labels. Learn how to handle these challenges with techniques that remain open areas of new research for addressing real-world machine learning problems.
- How to Speed up Pandas by 4x with one line of code - Nov 12, 2019.
While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep.
- Set Operations Applied to Pandas DataFrames - Nov 7, 2019.
In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.
- How to Create a Vocabulary for NLP Tasks in Python - Nov 7, 2019.
This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.
- How Data Labeling Facilitates AI Models - Oct 31, 2019.
AI-based models are highly dependent on accurate, clean, well-labeled, and prepared data in order to produce the desired output and cognition. These models are fed with bulky datasets covering an array of probabilities and computations to make its functioning as smart and gifted as human intelligence.
- 5 Advanced Features of Pandas and How to Use Them - Oct 25, 2019.
The pandas library offers core functionality when preparing your data using Python. But, many don't go beyond the basics, so learn about these lesser-known advanced methods that will make handling your data easier and cleaner.
- Know Your Data: Part 2 - Oct 8, 2019.
To build an effective learning model, it is must to understand the quality issues exist in data & how to detect and deal with it. In general, data quality issues are categories in four major sets.
- Data Preparation for Machine learning 101: Why it’s important and how to do it - Oct 2, 2019.
As data scientists who are the brains behind the AI-based innovations, you need to understand the significance of data preparation to achieve the desired level of cognitive capability for your models. Let’s begin.
- Data Mapping Using Machine Learning - Sep 27, 2019.
Data mapping is a way to organize various bits of data into a manageable and easy-to-understand system.
- KDnuggets™ News 19:n28, Jul 31: Top 13 Skills To Become a Rockstar Data Scientist; Best Podcasts on AI, Analytics, Data Science - Jul 31, 2019.
Learn the essential skills needed to become a Data Science rockstar; Understand CNNs with Python + Tensorflow + Keras tutorial; Discover the best podcasts about AI, Analytics, Data Science; and find out where you can get the best Certificates in the field
- Fantastic Four of Data Science Project Preparation - Jul 26, 2019.
This article takes a closer look at the four fantastic things we should keep in mind when approaching every new data science project.
- KDnuggets™ News 19:n24, Jun 26: Understand Cloud Services; Pandas Tips & Tricks; Master Data Preparation w/ Python - Jun 26, 2019.
Happy summer! This week on KDnuggets: Understanding Cloud Data Services; How to select rows and columns in Pandas using [ ], .loc, iloc, .at and .iat; 7 Steps to Mastering Data Preparation for Machine Learning with Python; Examining the Transformer Architecture: The OpenAI GPT-2 Controversy; Data Literacy: Using the Socratic Method; and much more!
- 7 Steps to Mastering Data Preparation for Machine Learning with Python — 2019 Edition - Jun 24, 2019.
Interested in mastering data preparation with Python? Follow these 7 steps which cover the concepts, the individual tasks, as well as different approaches to tackling the entire process from within the Python ecosystem.
- How to select rows and columns in Pandas using [ ], .loc, iloc, .at and .iat - Jun 19, 2019.
Subset selection is one of the most frequently performed tasks while manipulating data. Pandas provides different ways to efficiently select subsets of data from your DataFrame.
- Crowdsourcing vs. Managed Teams: A Study in Data Labeling Quality - Jun 12, 2019.
You need data labeled for ML. You can do it in-house, crowdsource it, or hire a managed service. If data quality matters, read this.
- 5 Ways to Deal with the Lack of Data in Machine Learning - Jun 10, 2019.
Effective solutions exist when you don't have enough data for your models. While there is no perfect approach, five proven ways will get your model to production.
- End-to-End Machine Learning: Making videos from images - May 23, 2019.
Video is a natural way for us to understand three dimensional and time varying information. Read this short post on how to achieve the creation of videos from still images.
- How to fix an Unbalanced Dataset - May 8, 2019.
We explain several alternative ways to handle imbalanced datasets, including different resampling and ensembling methods with code examples.
- Top R Packages for Data Cleaning - Mar 15, 2019.
Data cleaning is one of the most important and time consuming task for data scientists. Here are the top R packages for data cleaning.
- Preparing for the Unexpected - Feb 28, 2019.
In some domains, new values appear all the time, so it's crucial to handle them in a good way. Using deep learning, one can learn a special Out-of-Vocabulary embedding for these new values. But how can you train this embedding to generalize well to any unseen value? We explain one of the methods employed at Taboola.
- Acquiring Labeled Data to Train Your Models at Low Costs - Feb 27, 2019.
We discuss groundbreaking and unique methods to acquire labeled data at low cost, including 3rd-Party Plug-and-Play AI Model, Zero-Shot Learning, and Restructuring the Existing Data Set.
- Automatic Machine Learning is broken - Feb 19, 2019.
We take a look at the arguments against implementing a machine learning solution, and the occasions when the problems faced are not ML problems and can perhaps be solved using optimization, exploratory data analysis tasks or problems that can be solved with simple statistics.
- Feature engineering, Explained - Dec 21, 2018.
A brief introduction to feature engineering, covering coordinate transformation, continuous data, categorical features, missing values, normalization, and more.
- Six Steps to Master Machine Learning with Data Preparation - Dec 21, 2018.
To prepare data for both analytics and machine learning initiatives teams can accelerate machine learning and data science projects to deliver an immersive business consumer experience that accelerates and automates the data-to-insight pipeline by following six critical steps.
- Exploring the Data Jungle Free eBook - Dec 18, 2018.
This free eBook by Brian Godsey will provide you with real-world examples in Python, R, and other languages suitable for data science.
- Common mistakes when carrying out machine learning and data science - Dec 6, 2018.
We examine typical mistakes in Data Science process, including wrong data visualization, incorrect processing of missing values, wrong transformation of categorical variables, and more. Learn what to avoid!
- How to build a data science project from scratch - Dec 5, 2018.
A demonstration using an analysis of Berlin rental prices, covering how to extract data from the web and clean it, gaining deeper insights, engineering of features using external APIs, and more.
- Data Science Projects Employers Want To See: How To Show A Business Impact - Dec 4, 2018.
The best way to create better data science projects that employers want to see is to provide a business impact. This article highlights the process using customer churn prediction in R as a case-study.
- Text Preprocessing in Python: Steps, Tools, and Examples - Nov 6, 2018.
We outline the basic steps of text preprocessing, which are needed for transferring text from human language to machine-readable format for further processing. We will also discuss text preprocessing tools.
Pages: 1 2
- Notes on Feature Preprocessing: The What, the Why, and the How - Oct 26, 2018.
This article covers a few important points related to the preprocessing of numeric data, focusing on the scaling of feature values, and the broad question of dealing with outliers.
- Introduction to Active Learning - Oct 23, 2018.
An extensive overview of Active Learning, with an explanation into how it works and can assist with data labeling, as well as its performance and potential limitations.
- ebook: Aggregating Data with Apache Spark™ - Sep 12, 2018.
Learn why cluster computing makes Spark the ideal processing engine for complex aggregations, the different types of aggregations that you can do with Spark, and more.
- Self-Service Data Prep Tools vs Enterprise-Level Solutions? 6 Lessons Learned - Aug 30, 2018.
A detailed comparison between self-service data preparation tools and enterprise-level solutions, covering business strategy, accessible tools and solutions and more.
- Text Mining on the Command Line - Jul 13, 2018.
In this tutorial, I use raw bash commands and regex to process raw and messy JSON file and raw HTML page. The tutorial helps us understand the text processing mechanism under the hood.
- Data Retrieval and Cleaning: Tracking Migratory Patterns - Jul 3, 2018.
In this post, we walk through investigating, retrieving, and cleaning a real world data set. We will also describe the cost benefits and necessary tools involved in building your own data sets.
- 5 Data Science Projects That Will Get You Hired in 2018 - Jun 26, 2018.
A portfolio of real-world projects is the best way to break into data science. This article highlights the 5 types of projects that will help land you a job and improve your career.
- Stagraph – a general purpose R GUI, for data import, wrangling, and visualization - Jun 25, 2018.
Stagraph is a new simple visual interface for R, which focuses on data import, data wrangling and data visualization.
- Natural Language Processing Nuggets: Getting Started with NLP - Jun 19, 2018.
Check out this collection of NLP resources for beginners, starting from zero and slowly progressing to the point that readers should have an idea of where to go next.
- ioModel Machine Learning Research Platform – Open Source - Jun 5, 2018.
This article introduces ioModel, an open source research platform that ingests data and automatically generates descriptive statistics on that data.
- Virtual Training Events Without Leaving Your Desk - May 30, 2018.
Check out our lineup of upcoming virtual seminars, online learning courses, and customized training in your office. Space is limited, so reserve your seat early and score the best savings!
- How to Organize Data Labeling for Machine Learning: Approaches and Tools - May 16, 2018.
The main challenge for a data science team is to decide who will be responsible for labeling, estimate how much time it will take, and what tools are better to use.
Pages: 1 2
- Data Augmentation: How to use Deep Learning when you have Limited Data - May 9, 2018.
This article is a comprehensive review of Data Augmentation techniques for Deep Learning, specific to images.
- 7 Useful Suggestions from Andrew Ng “Machine Learning Yearning” - May 8, 2018.
Machine Learning Yearning is a book by AI and Deep Learning guru Andrew Ng, focusing on how to make machine learning algorithms work and how to structure machine learning projects. Here we present 7 very useful suggestions from the book.
- Getting Started with spaCy for Natural Language Processing - May 2, 2018.
spaCy is a Python natural language processing library specifically designed with the goal of being a useful library for implementing production-ready systems. It is particularly fast and intuitive, making it a top contender for NLP tasks.
- Actionable Insights with Predictive Analytics for Marketers, May 9 - May 1, 2018.
Learn how your predictions can only be as good as your data, how to fix imperfect data, how to structure your customer data for optimal predictive power, and more.
- The Dirty Little Secret Every Data Scientist Knows (but won’t admit) - Apr 26, 2018.
Most people don’t realize, but the actual “fancy” machine learning algorithm is like the last mile of the marathon. There is so much that must be done before you get there!
- Minimizing Model Risk with Automated Data Preparation & Machine Learning, Apr 19 - Apr 2, 2018.
Join DataRobot, Apr 19 at 2:00 pm ET/11:00 am PT, for a webinar on how to use Automated Data Preparation & Machine Learning to gain a competitive advantage, while quickly aligning your business operations to regulatory requirements.
- Principles of Guided Analytics - Mar 27, 2018.
KNIME outline their guided analytics system and explain how this can assist data scientists to predict future outcomes.
- Text Data Preprocessing: A Walkthrough in Python - Mar 26, 2018.
This post will serve as a practical walkthrough of a text data preprocessing task using some common Python tools.
- 5 Things to Know About Machine Learning - Mar 7, 2018.
This post will point out 5 thing to know about machine learning, 5 things which you may not know, may not have been aware of, or may have once known and now forgotten.
- The Value of Semi-Supervised Machine Learning - Jan 17, 2018.
This post shows you how to label hundreds of thousands of images in an afternoon. You can use the same approach whether you are labeling images or labeling traditional tabular data (e.g, identifying cyber security atacks or potential part failures).
- Governance in Data Science - Jan 16, 2018.
Governance roles for data science and analytics teams are becoming more common... One of the key functions of this role is to perform analysis and validation of data sets in order to build confidence in the underlying data sets.
- Webcasts: Finding analytic solutions to real problems - Jan 3, 2018.
The Technically Speaking webcast series provides real-word case studies with key insights on overcoming the challenges in data collection, preparation, and analysis - find the webcast that fits your current challenge.
- A General Approach to Preprocessing Text Data - Dec 1, 2017.
Recently we had a look at a framework for textual data science tasks in their totality. Now we focus on putting together a generalized approach to attacking text data preprocessing, regardless of the specific textual data science task you have in mind.
- Automated Feature Engineering for Time Series Data - Nov 20, 2017.
We introduce a general framework for developing time series models, generating features and preprocessing the data, and exploring the potential to automate this process in order to apply advanced machine learning algorithms to almost any time series problem.
- Webinar: Data Preparation Essentials for Automated Machine Learning, Nov 29 - Nov 16, 2017.
Jen Underwood will review how to organize data in a machine learning-friendly format that accurately reflects the business process and outcomes.
- Social Media and Machine Learning Transform Self-service Data Prep - Oct 16, 2017.
Social media and machine learning concepts are transforming self-service data prep into a collaborative data marketplace.
- Python Data Preparation Case Files: Group-based Imputation - Sep 25, 2017.
The second part in this series addresses group-based imputation for dealing with missing data values. Check out why finding group means can be a more formidable action than overall means, and see how to accomplish it in Python.
- A Solution to Missing Data: Imputation Using R - Sep 21, 2017.
Handling missing values is one of the worst nightmares a data analyst dreams of. In situations, a wise analyst ‘imputes’ the missing values instead of dropping them from the data.
- Python Data Preparation Case Files: Removing Instances & Basic Imputation - Sep 14, 2017.
This is the first of 3 posts to cover imputing missing values in Python using Pandas. The slowest-moving of the series (out of necessity), this first installment lays out the task and data at the risk of boring you. The next 2 posts cover group- and regression-based imputation.
- 42 Steps to Mastering Data Science - Aug 25, 2017.
This post is a collection of 6 separate posts of 7 steps a piece, each for mastering and better understanding a particular data science topic, with topics ranging from data preparation, to machine learning, to SQL databases, to NoSQL and beyond.
- The Ultimate Guide to Basic Data Cleaning - Aug 24, 2017.
Data cleaning can seem intimidating, but it’s not hard if you know the basic steps. That’s why we’re excited to announce our newest ebook, “The Ultimate Guide to Basic Data Cleaning”!
- 37 Reasons why your Neural Network is not working - Aug 22, 2017.
Over the course of many debugging sessions, I’ve compiled my experience along with the best ideas around in this handy list. I hope they would be useful to you.
Pages: 1 2
- Data Version Control in Analytics DevOps Paradigm - Aug 14, 2017.
DevOps and DVC tools can help reduce time data scientists spend on mundane data preparation and achieve their dream of focusing on cool machine learning algorithms and interesting data analysis.
- How to squeeze the most from your training data - Jul 27, 2017.
In many cases, getting enough well-labelled training data is a huge hurdle for developing accurate prediction systems. Here is an innovative approach which uses SVM to get the most from training data.
- Exploratory Data Analysis in Python - Jul 7, 2017.
We view EDA very much like a tree: there is a basic series of steps you perform every time you perform EDA (the main trunk of the tree) but at each step, observations will lead you down other avenues (branches) of exploration by raising questions you want to answer or hypotheses you want to test.
- 7 Ways to Get High-Quality Labeled Training Data at Low Cost - Jun 13, 2017.
Having labeled training data is needed for machine learning, but getting such data is not simple or cheap. We review 7 approaches including repurposing, harvesting free sources, retrain models on progressively higher quality data, and more.
- KDnuggets™ News 17:n22, Jun 7: 7 Steps to Mastering Data Preparation with Python; Why Does Deep Learning Not Have a Local Minimum? - Jun 7, 2017.
7 Steps to Mastering Data Preparation with Python; Why Does Deep Learning Not Have a Local Minimum?; 7 Techniques to Handle Imbalanced Data; Which Machine Learning Algorithm Should I Use?; Is Regression Analysis Really Machine Learning?
- 7 Steps to Mastering Data Preparation with Python - Jun 2, 2017.
Follow these 7 steps for mastering data preparation, covering the concepts, the individual tasks, as well as different approaches to tackling the entire process from within the Python ecosystem.
Pages: 1 2
- 7 Techniques to Handle Imbalanced Data - Jun 1, 2017.
This blog post introduces seven techniques that are commonly applied in domains like intrusion detection or real-time bidding, because the datasets are often extremely imbalanced.
- KDnuggets™ News 17:n21, May 31: Python Machine Learning Workflows from Scratch; Machine Learning Crash Course - May 31, 2017.
Machine Learning Workflows in Python from Scratch Part 1: Data Preparation; Machine Learning Crash Course: Part 1; An Introduction to the MXNet Python API; How A Data Scientist Can Improve Productivity; Data science platforms are on the rise and IBM is leading the way
- Data preprocessing for deep learning with nuts-ml - May 30, 2017.
Nuts-ml is a new data pre-processing library in Python for GPU-based deep learning in vision. It provides common pre-processing functions as independent, reusable units. These so called ‘nuts’ can be freely arranged to build data flows that are efficient, easy to read and modify.
- Machine Learning Workflows in Python from Scratch Part 1: Data Preparation - May 29, 2017.
This post is the first in a series of tutorials for implementing machine learning workflows in Python from scratch, covering the coding of algorithms and related tools from the ground up. The end result will be a handcrafted ML toolkit. This post starts things off with data preparation.
- Data Preparation Strategies for Successful Machine Learning - May 18, 2017.
This upcoming 45-minute webinar explores efficient methods to explore and organize complex data, how to marry multiple datasets for feature engineering, and optimal target selection and how to address information leakage.
- Technically Speaking – Analytic solutions to real-world problems - May 3, 2017.
Are you and your data "having issues?" JMP real-world case studies help you solve them with key insights on overcoming the challenges with data collection, preparation, and analysis.
- Pandas Cheat Sheet: Data Science and Data Wrangling in Python - Jan 27, 2017.
The Pandas library can seem very elaborate and it might be hard to find a single point of entry to the material: with other learning materials focusing on different aspects of this library, you can definitely use a reference sheet to help you get the hang of it.
- Data Exploration in Preparation for Modeling - Jan 13, 2017.
The most important traits for a good data analyst or data miner are curiosity, creativity and intuition for how to answer important questions using data. Read this white paper to learn more.
- 6 Steps to Effective Data Preparation for Quality Conclusions - Jan 12, 2017.
Data preparation is usually the most time consuming part of a data analysis project. To get good results, follow the six steps here, starting with Understand the Business Needs, Get to Know the Data, and Wrangle, Munge, and Mash Up.
- Tidying Data in Python - Jan 4, 2017.
This post summarizes some tidying examples Hadley Wickham used in his 2014 paper on Tidy Data in R, but will demonstrate how to do so using the Python pandas library.
- 5 Machine Learning Projects You Can No Longer Overlook, January - Jan 2, 2017.
There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects, the most recent in an ongoing series.
- Interviews with Data Scientists: Claudia Perlich - Dec 2, 2016.
In this wide-ranging interview, Roberto Zicari talks to a leading Data Scientist Claudia Perlich about what they must know about Machine Learning and evaluation, domain knowledge, data blending, and more.
Pages: 1 2
- Data Exploration in Preparation for Modeling - Nov 16, 2016.
What you don't know can hurt you, especially in predictive modeling. Read great examples how exploring your data before creating models will help you spot problems before your build incorrect models.
- How to Choose a Data Format - Nov 3, 2016.
In any data analytics project, after business understanding phase, data understanding and selection of right data format as well as ETL tools is very important task. In this article, a very useful and practical set of guidelines is explained covering data format selection and ETL phases of project lifecycle.
Pages: 1 2
- Data Preparation Tips, Tricks, and Tools: An Interview with the Insiders - Oct 14, 2016.
Data preparation and preprocessing tasks constitute a high percentage of any data-centric operation. In order to provide some insight, we have asked a pair of experts to answer a few questions on the subject.
Pages: 1 2
- Behind the Dream of Data Work as it Could Be - Sep 13, 2016.
This post is an insider's overview of data.world, and their attempt to build the most meaningful, collaborative, and abundant data resource in the world.
Pages: 1 2
- Automating Data Ingestion: 3 Important Parts - Sep 9, 2016.
In the day and age of ‘Big Data”, data ingestion has to be automated on some level. How best to automate it?
- Choosing Tools for Data ETLs - Aug 9, 2016.
Which tool should I use for my data pipelines? Get some advice from a data scientist recently having gone through this pipeline tool selection process.
- Looker: Exploring the Census, Aug 11 Webinar - Aug 3, 2016.
We will dig into 20 years of Census voting data that we have loaded into Google BigQuery and modeled in Looker. You can ask anything you're interested and we will look it up, live.
- Data Science for Beginners 2: Is your data ready? - Jul 28, 2016.
This second video and write-up in the Data Science for Beginners series discusses what is required of your data before it can be useful.
- 5 More Machine Learning Projects You Can No Longer Overlook - Jun 28, 2016.
There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects.
- Infinite Data Overlap Detection Arrives to Speed Business Insights - Jun 8, 2016.
Infinite Data Overlap Detection(IDOD) is a new, Spark-based technology that empowers non-technical business users to automatically discover data patterns and blendany data type for any set of values from multiple sources – both inside and outside the enterprise.
- Doing Data Science: A Kaggle Walkthrough Part 3 – Cleaning Data - Jun 3, 2016.
This is part three in a fantastic 6 part series covering the process of data science, and the application of the process to a Kaggle competition. In this episode, data cleaning and preparation is covered.
Pages: 1 2
- How to Quantize Neural Networks with TensorFlow - May 4, 2016.
The simplest motivation for quantization is to shrink neural network representation by storing the min and max for each layer. Learn more how to perform quantization for deep neural networks.
Pages: 1 2
- How to Remove Duplicates in Large Datasets - Apr 27, 2016.
Dealing with huge datasets can be tricky, especially the data cleaning process. One of such processing is de-duplication, find out how you can solve this using the statistical techniques.
- Doing Data Science: A Kaggle Walkthrough – Cleaning Data - Mar 23, 2016.
Gain insight into the process of cleaning data for a specific Kaggle competition, including a step by step overview.
Pages: 1 2
- R Learning Path: From beginner to expert in R in 7 steps - Mar 23, 2016.
This learning path is mainly for novice R users that are just getting started but it will also cover some of the latest changes in the language that might appeal to more advanced R users.
Pages: 1 2 3
- Prove Your Point with Data and a Fast Python Library - Jan 7, 2016.
Harness the power of Python and the command line to prove your point using data and a fast data-processing library.
- 5 Criteria To Determine If Your Data Is Ready For Serious Data Science - Dec 21, 2015.
If your data is a large, relevant, accurate, connected, and you also have a sharp question, you ready to do some serious data science. If you’re weak on 1-2 points, don’t worry. But if most criteria are not true, you need to do more preparation.
- Webinar: 5 tips to get more out of Data Lakes, Dec 16 - Dec 1, 2015.
Learn valuable tips to help optimize Big Data for agility and speed to insight; improve data accessibility, without the limitations of data warehouses, and prevent data sources from becoming data silos.
- Lavastorm – 5 Tips to Get More From Tableau - Oct 20, 2015.
Tableau makes it easy for users to see the data, but data preparation for it is hard. This free ebook highlights how to overcome Tableau challenges with data access, data blending, advanced analytics, transparency and reusability.
- Upcoming Webcasts on Analytics, Big Data, Data Science – Oct 20 and beyond - Oct 19, 2015.
Easier Data Prep and Analysis for Data Scientists, Measure and Enhance Analytics Maturity, Amazon QuickSight, Textual Healing, and more.