Search results for consistency
-
5 Essential Papers on AI Training Data
Data pre-processing is not only the largest time sink for most Data Scientists, but it is also the most crucial aspect of the work. Learn more about training data and data processing tasks from 5 leading academic papers.https://www.kdnuggets.com/2020/06/5-essential-papers-ai-training-data.html
-
Taming Complexity in MLOps
A greatly expanded v2.0 of the open-source Orbyter toolkit helps data science teams continue to streamline machine learning delivery pipelines, with an emphasis on seamless deployment to production.https://www.kdnuggets.com/2020/05/taming-complexity-mlops.html
-
Beginners Learning Path for Machine Learning">Beginners Learning Path for Machine Learning
So, you are interested in machine learning? Here is your complete learning path to start your career in the field.https://www.kdnuggets.com/2020/05/beginners-learning-path-machine-learning.html
-
Building a Mature Machine Learning Team
After spending a lot of time thinking about the paths that software companies take toward ML maturity, this framework was created to follow as you adopt ML and then mature as an organization. The framework covers every aspect of building a team including product, process, technical, and organizational readiness, as well as recognizes the importance of cross-functional expertise and process improvements for bringing AI-driven products to market.https://www.kdnuggets.com/2020/03/mature-machine-learning-team.html
-
How To Build Your Own Feedback Analysis Solution
Automating the analysis of customer feedback will sound like a great idea after reading a couple hundred reviews. Building an NLP solution to provide in-depth analysis of what your customers are thinking is a serious undertaking, and this guide helps you scope out the entire project.https://www.kdnuggets.com/2020/03/build-feedback-analysis-solution.html
-
How Bad Data is Affecting Your Organization’s Operational Efficiency
Despite recognizing the importance of data quality, many companies still fail to implement a data quality framework that could protect them from making costly mistakes. Poor data does not just cause revenue loss – it’s the reason your company could lose employees, customers and reputation!https://www.kdnuggets.com/2020/03/bad-data-affecting-organizations-operational-efficiency.html
-
Recreating Fingerprints using Convolutional Autoencoders
The article gets you started working with fingerprints using Deep Learning.https://www.kdnuggets.com/2020/03/recreating-fingerprints-using-convolutional-autoencoders.html
-
Inside The Machine Learning that Google Used to Build Meena: A Chatbot that Can Chat About Anything
Meena is one of the major milestones in the history of NLU. How did Google build it?https://www.kdnuggets.com/2020/02/inside-machine-learning-google-build-meena-chatbot.html
-
Top 10 AI, Machine Learning Research Articles to know">Top 10 AI, Machine Learning Research Articles to know
We’ve seen many predictions for what new advances are expected in the field of AI and machine learning. Here, we review a “data set” based on what researchers were apparently studying at the turn of the decade to take a fresh glimpse into what might come to pass in 2020.https://www.kdnuggets.com/2020/01/top-10-ai-ml-articles-to-know.html
-
Explaining Black Box Models: Ensemble and Deep Learning Using LIME and SHAP
This article will demonstrate explainability on the decisions made by LightGBM and Keras models in classifying a transaction for fraudulence, using two state of the art open source explainability techniques, LIME and SHAP.https://www.kdnuggets.com/2020/01/explaining-black-box-models-ensemble-deep-learning-lime-shap.html
-
What is Data Catalog and Why You Should Care?
Learn why data catalogs could be just the thing you need to meet the challenges of data and metadata management and collaboration.https://www.kdnuggets.com/2019/12/data-catalog.html
-
Reproducibility, Replicability, and Data Science
As cornerstones of scientific processes, reproducibility and replicability ensure results can be verified and trusted. These two concepts are also crucial in data science, and as a data scientist, you must follow the same rigor and standards in your projects.https://www.kdnuggets.com/2019/11/reproducibility-replicability-data-science.html
-
Beginners Guide to the Three Types of Machine Learning
The following article is an introduction to classification and regression — which are known as supervised learning — and unsupervised learning — which in the context of machine learning applications often refers to clustering — and will include a walkthrough in the popular python library scikit-learn.https://www.kdnuggets.com/2019/11/beginners-guide-three-types-machine-learning.html
-
Research Guide for Depth Estimation with Deep Learning
In this guide, we’ll look at papers aimed at solving the problems of depth estimation using deep learning.https://www.kdnuggets.com/2019/11/research-guide-depth-estimation-deep-learning.html
-
Data Cleaning and Preprocessing for Beginners
Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.https://www.kdnuggets.com/2019/11/data-cleaning-preprocessing-beginners.html
-
10 Free Must-read Books on AI">10 Free Must-read Books on AI
Artificial Intelligence continues to fill the media headlines while scientists and engineers rapidly expand its capabilities and applications. With such explosive growth in the field, there is a great deal to learn. Dive into these 10 free books that are must-reads to support your AI study and work.https://www.kdnuggets.com/2019/11/10-free-must-read-books-ai.html
-
Everything a Data Scientist Should Know About Data Management">Everything a Data Scientist Should Know About Data Management
For full-stack data science mastery, you must understand data management along with all the bells and whistles of machine learning. This high-level overview is a road map for the history and current state of the expansive options for data storage and infrastructure solutions.https://www.kdnuggets.com/2019/10/data-scientist-data-management.html
-
An Overview of Density Estimation
Density estimation is estimating the probability density function of the population from the sample. This post examines and compares a number of approaches to density estimation.https://www.kdnuggets.com/2019/10/overview-density-estimation.html
-
The Last SQL Guide for Data Analysis You’ll Ever Need">The Last SQL Guide for Data Analysis You’ll Ever Need
This is it: the last SQL guide for data analysis you'll ever need! OK, maybe it’s actually the first. But it’ll give you a solid head start.https://www.kdnuggets.com/2019/10/last-sql-guide-data-analysis-ever-need.html
-
Data Quality Assessment Is Not All Roses. What Challenges Should You Be Aware Of?
Of all data quality characteristics, we consider consistency and accuracy to be the most difficult ones to measure. Here, we describe the challenges that you may encounter and the ways to overcome them.https://www.kdnuggets.com/2019/09/data-quality-assessment-challenges.html
-
6 Tips for Building a Training Data Strategy for Machine Learning
Without a well-defined approach for collecting and structuring training data, launching an AI initiative becomes an uphill battle. These six recommendations will help you craft a successful strategy.https://www.kdnuggets.com/2019/09/6-tips-training-data-strategy-machine-learning.html
-
Object-oriented programming for data scientists: Build your ML estimator">Object-oriented programming for data scientists: Build your ML estimator
Implement some of the core OOP principles in a machine learning context by building your own Scikit-learn-like estimator, and making it better.https://www.kdnuggets.com/2019/08/object-oriented-programming-data-scientists-estimator.html
-
Learn how to use PySpark in under 5 minutes (Installation + Tutorial)
Apache Spark is one of the hottest and largest open source project in data processing framework with rich high-level APIs for the programming languages like Scala, Python, Java and R. It realizes the potential of bringing together both Big Data and machine learning.https://www.kdnuggets.com/2019/08/learn-pyspark-installation-tutorial.html
-
The Death of Big Data and the Emergence of the Multi-Cloud Era">The Death of Big Data and the Emergence of the Multi-Cloud Era
The Era of Big Data is coming to an end as the focus shifts from how we collect data to processing that data in real-time. Big Data is now a business asset supporting the next eras of multi-cloud support, machine learning, and real-time analytics.https://www.kdnuggets.com/2019/07/death-big-data-multi-cloud-era.html
-
A complete guide to K-means clustering algorithm
Clustering - including K-means clustering - is an unsupervised learning technique used for data classification. We provide several examples to help further explain how it works.https://www.kdnuggets.com/2019/05/guide-k-means-clustering-algorithm.html
-
Generative Adversarial Networks – Key Milestones and State of the Art
We provide an overview of Generative Adversarial Networks (GANs), discuss challenges in GANs learning, and examine two promising GANs: the RadialGAN, designed for numbers, and the StyleGAN, which does style transfer for images.https://www.kdnuggets.com/2019/04/future-generative-adversarial-networks.html
-
All you need to know about text preprocessing for NLP and Machine Learning
We present a comprehensive introduction to text preprocessing, covering the different techniques including stemming, lemmatization, noise removal, normalization, with examples and explanations into when you should use each of them.https://www.kdnuggets.com/2019/04/text-preprocessing-nlp-machine-learning.html
-
Checklist for Debugging Neural Networks
Check out these tangible steps you can take to identify and fix issues with training, generalization, and optimization for machine learning models.https://www.kdnuggets.com/2019/03/checklist-debugging-neural-networks.html
-
Top R Packages for Data Cleaning
Data cleaning is one of the most important and time consuming task for data scientists. Here are the top R packages for data cleaning.https://www.kdnuggets.com/2019/03/top-r-packages-data-cleaning.html
-
Towards Automatic Text Summarization: Extractive Methods
The basic idea looks simple: find the gist, cut off all opinions and detail, and write a couple of perfect sentences, the task inevitably ended up in toil and turmoil. Here is a short overview of traditional approaches that have beaten a path to advanced deep learning techniques.https://www.kdnuggets.com/2019/03/towards-automatic-text-summarization.html
-
3 Reasons Why AutoML Won’t Replace Data Scientists Yet
We dispel the myth that AutoML is replacing Data Scientists jobs by highlighting three factors in Data Science development that AutoML can’t solve.https://www.kdnuggets.com/2019/03/why-automl-wont-replace-data-scientists.html
-
10 Exciting Ideas of 2018 in NLP
We outline a selection of exciting developments in NLP from the last year, and include useful recent papers and images to help further assist with your learning.https://www.kdnuggets.com/2019/01/10-exciting-ideas-2018-nlp.html
-
The brain as a neural network: this is why we can’t get along
This article sets out to answer the question: what insights can we gain about ourselves by thinking of the brain as a machine learning model?https://www.kdnuggets.com/2018/12/brain-neural-network.html
-
Four Approaches to Explaining AI and Machine Learning
We discuss several explainability techniques being championed today, including LOCO (leave one column out), permutation impact, and LIME (local interpretable model-agnostic explanations).https://www.kdnuggets.com/2018/12/four-approaches-ai-machine-learning.html
-
Machine Reading Comprehension: Learning to Ask & Answer
Investigating the dual ask-answer network, covering the embedding, encoding, attention and output layer, as well as the loss function, with code examples to help you get started.https://www.kdnuggets.com/2018/10/machine-reading-comprehension-learning-ask-answer.html
-
Ethics + Data Science: opinion by DJ Patil, former US Chief Data Scientist
How much has data changed our lives over the past decade? Former US Chief Data Scientist DJ Patil investigates.https://www.kdnuggets.com/2018/09/ethics-data-science.html
-
DynamoDB vs. Cassandra: from “no idea” to “it’s a no-brainer”
DynamoDB vs. Cassandra: have they got anything in common? If yes, what? If no, what are the differences? We answer these questions and examine performance of both databases.https://www.kdnuggets.com/2018/08/dynamodb-vs-cassandra.html
-
Affordable online news archives for academic research
Many researchers need access to multi-year historical repositories of online news articles. We identified three companies that make such access affordable, and spoke with their CEOs.https://www.kdnuggets.com/2018/08/affordable-online-news-archives.html
-
Cookiecutter Data Science: How to Organize Your Data Science Project">Cookiecutter Data Science: How to Organize Your Data Science Project
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.https://www.kdnuggets.com/2018/07/cookiecutter-data-science-organize-data-project.html
-
Building A Data Science Product in 10 Days
At startups, you often have the chance to create products from scratch. In this article, the author will share how to quickly build valuable data science products, using his first project at Instacart as an example.https://www.kdnuggets.com/2018/07/building-data-science-product-10-days.html
-
The 5 Clustering Algorithms Data Scientists Need to Know">The 5 Clustering Algorithms Data Scientists Need to Know
Today, we’re going to look at 5 popular clustering algorithms that data scientists need to know and their pros and cons!https://www.kdnuggets.com/2018/06/5-clustering-algorithms-data-scientists-need-know.html
-
Packaging and Distributing Your Python Project to PyPI for Installation Using pip
This tutorial will explain the steps required to package your Python projects, distribute them in distribution formats using steptools, upload them into the Python Package Index (PyPI) repository using twine, and finally installation using Python installers such as pip and conda.https://www.kdnuggets.com/2018/06/packaging-distributing-python-project-pypi-pip.html
-
Complete Guide to Build ConvNet HTTP-Based Application using TensorFlow and Flask RESTful Python API">Complete Guide to Build ConvNet HTTP-Based Application using TensorFlow and Flask RESTful Python API
In this tutorial, a CNN is to be built, and trained and tested against the CIFAR10 dataset. To make the model remotely accessible, a Flask Web application is created using Python to receive an uploaded image and return its classification label using HTTP.https://www.kdnuggets.com/2018/05/complete-guide-convnet-tensorflow-flask-restful-python-api.html
-
Torus for Docker-First Data Science
To help data science teams adopt Docker and apply DevOps best practices to streamline machine learning delivery pipelines, we open-sourced a toolkit based on the popular cookiecutter project structure.https://www.kdnuggets.com/2018/05/torus-docker-first-data-science.html
-
Data Science vs Machine Learning vs Data Analytics vs Business Analytics">Data Science vs Machine Learning vs Data Analytics vs Business Analytics
This article gives a broad overview of data science and the various fields within it, including business analytics, data analytics, business intelligence, advanced analytics, machine learning, and AI.https://www.kdnuggets.com/2018/05/data-science-machine-learning-business-analytics.html
-
A Comparative Analysis of Top 6 BI and Data Visualization Tools in 2018">A Comparative Analysis of Top 6 BI and Data Visualization Tools in 2018
In this article, we will compare the most commonly used platforms and analyze their main features to help you choose one or several platforms that will provide indispensable aid for your work communication.https://www.kdnuggets.com/2018/02/comparative-analysis-top-6-bi-data-visualization-tools-2018.html
-
How To Grow As A Data Scientist">How To Grow As A Data Scientist
In order for a data scientist to grow, they need to be challenged beyond the technical aspects of their jobs. They need to question their data sources, be concise in their insights, know their business and help guide their leaders.https://www.kdnuggets.com/2018/01/how-grow-data-scientist.html
-
Best Masters in Data Science and Analytics – Asia and Australia Edition
The fourth edition of our comprehensive, unbiased survey on graduate degrees in Data Science and Analytics from around the world.https://www.kdnuggets.com/2017/12/best-masters-data-science-analytics-asia-australia.html
-
An opinionated Data Science Toolbox in R from Hadley Wickham, tidyverse
Get your productivity boosted with Hadley Wickham's powerful R package, tidyverse. It has all you need to start developing your own data science workflows.https://www.kdnuggets.com/2017/10/tidyverse-powerful-r-toolbox.html
-
Top Modules and Features of Business Intelligence Tools
What makes BI tools great? What features are important while selecting a good BI tool? Let’s have a look.https://www.kdnuggets.com/2017/07/top-modules-features-bi-tools.html
-
7 Ways to Get High-Quality Labeled Training Data at Low Cost
Having labeled training data is needed for machine learning, but getting such data is not simple or cheap. We review 7 approaches including repurposing, harvesting free sources, retrain models on progressively higher quality data, and more.https://www.kdnuggets.com/2017/06/acquiring-quality-labeled-training-data.html
-
Your Checklist to Get Data Science Implemented in Production
For over a year we surveyed thousands of companies from all types of industries and data science advancement on how they managed to overcome these difficulties and analyzed the results. Here are the key things to keep in mind when you're working on your design-to-production pipeline.https://www.kdnuggets.com/2017/06/dataiku-checklist-data-science-implemented-production.html
-
What is an Ontology? The simplest definition you’ll find… or your money back*
This post takes the concept of an ontology and presents it in a clear and simple manner, devoid of the complexities that often surround such explanations.https://www.kdnuggets.com/2017/05/ontology-simplest-definition.html
-
Must-Know: What are common data quality issues for Big Data and how to handle them?">Must-Know: What are common data quality issues for Big Data and how to handle them?
Let's have a look at common quality issues facing Big Data in terms of the key characteristics of Big Data – Volume, Velocity, Variety, Veracity, and Value.https://www.kdnuggets.com/2017/05/must-know-common-data-quality-issues-big-data.html
-
HDFS vs. HBase : All you need to know">HDFS vs. HBase : All you need to know
Hadoop Distributed File System (HDFS), and Hbase (Hadoop database) are key components of Big Data ecosystem. This blog explains the difference between HDFS and HBase with real-life use cases where they are best fit.https://www.kdnuggets.com/2017/05/hdfs-hbase-need-know.html
-
More Deep Learning “Magic”: Paintings to photos, horses to zebras, and more amazing image-to-image translation
This is an introduction to recent research which presents an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.https://www.kdnuggets.com/2017/04/unpaired-image-translation-cycle-gan.html
-
17 More Must-Know Data Science Interview Questions and Answers, Part 3">17 More Must-Know Data Science Interview Questions and Answers, Part 3
The third and final part of 17 new must-know Data Science interview questions and answers covers A/B testing, data visualization, Twitter influence evaluation, and Big Data quality.
https://www.kdnuggets.com/2017/03/17-data-science-interview-questions-answers-part-3.html
-
Software Engineering vs Machine Learning Concepts
Not all core concepts from software engineering translate into the machine learning universe. Here are some differences I've noticed.https://www.kdnuggets.com/2017/03/software-engineering-vs-machine-learning-concepts.html
-
Causation or Correlation: Explaining Hill Criteria using xkcd
This is an attempt to explain Hill’s criteria using xkcd comics, both because it seemed fun, and also to motivate causal inference instructures to have some variety in which xkcd comic they include in lectures.https://www.kdnuggets.com/2017/02/hill-data-scientist-xkcd-story.html
-
Career Advice for Analytics & Data Science Professionals
In our experience working with many quantitative professionals over the years, the two main areas that contribute to long-term career growth are networking and continuous learning. Here is specific advice on how to do this and tips for Continuous Learning.https://www.kdnuggets.com/2017/02/career-advice-analytics-data-science.html
-
An ode to the analytics grease monkeys
Analytics is not one time job. It needs to be automated, deployed and improved for future business analytics requirements. Here an IBM expert discusses about development & deployment of analytics assets and capabilities of it.https://www.kdnuggets.com/2017/02/analytics-grease-monkeys.html
-
Data Science and Big Data, Explained">Data Science and Big Data, Explained
This article is meant to give the non-data scientist a solid overview of the many concepts and terms behind data science and big data. While related terms will be mentioned at a very high level, the reader is encouraged to explore the references and other resources for additional detail.https://www.kdnuggets.com/2016/11/big-data-data-science-explained.html
-
Big Data Key Terms, Explained
Just getting started with Big Data, or looking to iron out the wrinkles in your current understanding? Check out these 20 Big Data-related terms and their concise definitions.https://www.kdnuggets.com/2016/08/big-data-key-terms-explained.html
-
Contest 2nd Place: Automated Data Science and Machine Learning in Digital Advertising
This post is an overview of an automated machine learning system in the digital advertising realm. It is an entrant and second-place recipient in the recent KDnuggets blog contest.https://www.kdnuggets.com/2016/08/automated-data-science-digital-advertising.html
-
What Data Scientists Can Learn From Qualitative Research
Learn what data scientists can learn from qualitative researchers when it comes to analysing text, and how this relates to writing quality code.https://www.kdnuggets.com/2016/07/data-scientists-learn-from-qualitative-research.html
-
Thinking About Analytics Readiness
This article touches upon an important but under-discussed topic of analytics readiness, including whether and when organizations should engage in analytics.https://www.kdnuggets.com/2016/06/thinking-domain-readiness.html
-
Advantages and Risks of Self-Service Analytics
Self-service analytics is likely to spread in all the business layers, and with proper care to avoid certain risks, the culture of self-service analytics will help all organizations.https://www.kdnuggets.com/2016/04/advantages-risks-self-service-analytics.html
-
21 Must-Know Data Science Interview Questions and Answers">21 Must-Know Data Science Interview Questions and Answers
KDnuggets Editors bring you the answers to 20 Questions to Detect Fake Data Scientists, including what is regularization, Data Scientists we admire, model validation, and more.https://www.kdnuggets.com/2016/02/21-data-science-interview-questions-answers.html
-
Beyond One-Hot: an exploration of categorical variables
Coding categorical variables into numbers, by assign an integer to each category ordinal coding of the machine learning algorithms. Here, we explore different ways of converting a categorical variable and their effects on the dimensionality of data.https://www.kdnuggets.com/2015/12/beyond-one-hot-exploration-categorical-variables.html
-
New Standard Methodology for Analytical Models
Traditional methods for the analytical modelling like CRISP-DM have several shortcomings. Here we describe these friction points in CRISP-DM and introduce a new approach of Standard Methodology for Analytics Models which overcomes them.https://www.kdnuggets.com/2015/08/new-standard-methodology-analytical-models.html
-
Can Deep Learning Help you Find the Perfect Girl? – Part 2
Using Deep Learning to find the perfect match, PhD student Harm de Vries describes the process of data collection and analysis. Finally, the results from matching algorithm are compared to human assessment for identifying an individual's dating preferences.https://www.kdnuggets.com/2015/07/can-deep-learning-help-find-perfect-girl-2.html
-
Interview: Joseph Babcock, Netflix on Genie, Lipstick, and Other In-house Developed Tools
We discuss role of analytics in content acquisition, data architecture at Netflix, organizational structure, and open-source tools from Netflix.https://www.kdnuggets.com/2015/06/interview-joseph-babcock-netflix-in-house-developed-tools.html
-
Research Leaders on Data Mining, Data Science, and Big Data key trends, top papers
We asked global research leaders in Data Science and Big Data what are the most interesting research papers/advances of 2014 and what are the key trends they see in 2015. Here are their answers.https://www.kdnuggets.com/2015/01/research-leaders-data-science-big-data-key-trends-top-papers.html
-
KDnuggets™ News 14:n35, Dec 29
Features | Software | Opinions | Interviews | News | Courses | Meetings | Jobs | Academic | Tweets | CFP | Quote Features 2015 Read more »https://www.kdnuggets.com/2014/n35.html
-
Containers: The Enabler of YARN
The evolution of a data-center operating system is discussed along with the underlying challenges and approaches being followed. Containers play a big role in enabling the required abstraction and deliver additional benefits.https://www.kdnuggets.com/2014/07/containers-yarn-enabler.html
-
3 Ways to Test the Accuracy of Your Predictive Models
3 different methods for testing accuracy of predictive models from 3 leading analytics experts - Karl Rexer, John Elder, and Dean Abbott explain using lift charts, randomization testing, and bootstrap sampling.https://www.kdnuggets.com/2014/02/3-ways-to-test-accuracy-your-predictive-models.html