Search results for spark dataset
-
State of the Machine Learning and AI Industry
Enterprises are struggling to launch machine learning models that encapsulate the optimization of business processes. These are now the essential components of data-driven applications and AI services that can improve legacy rule-based business processes, increase productivity, and deliver results. In the current state of the industry, many companies are turning to off-the-shelf platforms to increase expectations for success in applying machine learning.https://www.kdnuggets.com/2020/04/machine-learning-ai-industry.html
-
Python for data analysis… is it really that simple?!?">Python for data analysis… is it really that simple?!?
The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations.https://www.kdnuggets.com/2020/04/python-data-analysis-really-that-simple.html
-
Top AI Resources – Directory for Remote Learning
Whether you are just learning Data Science, a current professional, or just interested, it's crucial to keep the mind stimulated and stay current. With conferences, schools, and travel largely canceled because of #coronavirus, these remote resources will help you stay engaged.https://www.kdnuggets.com/2020/03/top-ai-resources-remote-learning.html
-
The 4 Best Jupyter Notebook Environments for Deep Learning
Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.https://www.kdnuggets.com/2020/03/4-best-jupyter-notebook-environments-deep-learning.html
-
Five Interesting Data Engineering Projects
As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.https://www.kdnuggets.com/2020/03/data-engineering-projects.html
-
Hand labeling is the past. The future is #NoLabel AI
Data labeling is so hot right now… but could this rapidly emerging market face disruption from a small team at Stanford and the Snorkel open source project, which enables highly efficient programmatic labeling that is 10 to 1,000x as efficient as hand labeling?https://www.kdnuggets.com/2020/02/hand-labeling-past-future-nolabel-ai.html
-
NLP Year in Review — 2019
In this blog post, I want to highlight some of the most important stories related to machine learning and NLP that I came across in 2019.https://www.kdnuggets.com/2020/01/nlp-year-review-2019.html
-
Explaining Black Box Models: Ensemble and Deep Learning Using LIME and SHAP
This article will demonstrate explainability on the decisions made by LightGBM and Keras models in classifying a transaction for fraudulence, using two state of the art open source explainability techniques, LIME and SHAP.https://www.kdnuggets.com/2020/01/explaining-black-box-models-ensemble-deep-learning-lime-shap.html
-
I wanna be a data scientist, but… how?">I wanna be a data scientist, but… how?
It’s easy to say "I wanna be a data scientist," but... where do you start? How much time is needed to be desired by companies? Do you need a Master’s degree? Do you need to know every mathematical concept ever derived? The journey might be long, but follow this plan to help you keep moving forward toward your career goal.https://www.kdnuggets.com/2020/01/wanna-be-data-scientist.html
-
Learning SQL the Hard Way">Learning SQL the Hard Way
Simply put: This post is about installing SQL, explaining SQL and running SQL.https://www.kdnuggets.com/2020/01/learning-sql-hard-way.html
-
H2O Framework for Machine Learning
This article is an overview of H2O, a scalable and fast open-source platform for machine learning. We will apply it to perform classification tasks.https://www.kdnuggets.com/2020/01/h2o-framework-machine-learning.html
-
Predict Electricity Consumption Using Time Series Analysis">Predict Electricity Consumption Using Time Series Analysis
Time series forecasting is a technique for the prediction of events through a sequence of time. In this post, we will be taking a small forecasting problem and try to solve it till the end learning time series forecasting alongside.https://www.kdnuggets.com/2020/01/predict-electricity-consumption-time-series-analysis.html
-
The ravages of concept drift in stream learning applications and how to deal with it
Stream data processing has gained progressive momentum with the arriving of new stream applications and big data scenarios. These streams of data evolve generally over time and may be occasionally affected by a change (concept drift). How to handle this change by using detection and adaptation mechanisms is crucial in many real-world systems.https://www.kdnuggets.com/2019/12/ravages-concept-drift-stream-learning-applications.html
-
The 4 Hottest Trends in Data Science for 2020">The 4 Hottest Trends in Data Science for 2020
The field of Data Science is growing with new capabilities and reach into every industry. With digital transformations occurring in organizations around the world, 2019 included trends of more companies leveraging more data to make better decisions. Check out these next trends in Data Science expected to take off in 2020.https://www.kdnuggets.com/2019/12/4-hottest-trends-data-science-2020.html
-
AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2019 and Key Trends for 2020">AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2019 and Key Trends for 2020
As we say goodbye to one year and look forward to another, KDnuggets has once again solicited opinions from numerous research & technology experts as to the most important developments of 2019 and their 2020 key trend predictions.https://www.kdnuggets.com/2019/12/predictions-ai-machine-learning-data-science-research.html
-
How to Visualize Data in Python (and R)
Producing accessible data visualizations is a key data science skill. The following guidelines will help you create the best representations of your data using R and Python's Pandas library.https://www.kdnuggets.com/2019/11/visualize-data-python-and-r.html
-
The Complete Data Science LinkedIn Profile Guide">The Complete Data Science LinkedIn Profile Guide
With so many Data Scientists showing up on LinkedIn, it's time to make sure your profile is top-notch because your talent is still highly sought after. Recruitment specialists want to find you fast, and this guide will help you create the best profile to feature your expertise.https://www.kdnuggets.com/2019/11/data-science-linkedin-profile-guide.html
-
Data Preparation for Machine learning 101: Why it’s important and how to do it
As data scientists who are the brains behind the AI-based innovations, you need to understand the significance of data preparation to achieve the desired level of cognitive capability for your models. Let’s begin.https://www.kdnuggets.com/2019/10/data-preparation-machine-learning-101.html
-
Natural Language in Python using spaCy: An Introduction
This article provides a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries.https://www.kdnuggets.com/2019/09/natural-language-python-using-spacy-introduction.html
-
There is No Free Lunch in Data Science">There is No Free Lunch in Data Science
There is no such thing as a free lunch in life or data science. Here, we'll explore some science philosophy and discuss the No Free Lunch theorems to find out what they mean for the field of data science.https://www.kdnuggets.com/2019/09/no-free-lunch-data-science.html
-
Train sklearn 100x Faster">Train sklearn 100x Faster
As compute gets cheaper and time to market for machine learning solutions becomes more critical, we’ve explored options for speeding up model training. One of those solutions is to combine elements from Spark and scikit-learn into our own hybrid solution.https://www.kdnuggets.com/2019/09/train-sklearn-100x-faster.html
-
How to count Big Data: Probabilistic data structures and algorithms
Learn how probabilistic data structures and algorithms can be used for cardinality estimation in Big Data streams.https://www.kdnuggets.com/2019/08/count-big-data-probabilistic-data-structures-algorithms.html
-
Top 13 Skills To Become a Rockstar Data Scientist">Top 13 Skills To Become a Rockstar Data Scientist
Education, coding, SQL, big data platforms, storytelling and more. These are the 13 skills you need to master to become a rockstar data scientist.https://www.kdnuggets.com/2019/07/top-13-skills-become-rockstar-data-scientist.html
-
Secrets to a Successful Data Science Interview
Are you puzzled as to what to prepare for data science interviews? That you are reading this document is a reflection of your seriousness in being a successful data scientist.https://www.kdnuggets.com/2019/07/secrets-data-science-interview.html
-
XLNet Outperforms BERT on Several NLP Tasks">XLNet Outperforms BERT on Several NLP Tasks
XLNet is a new pretraining method for NLP that achieves state-of-the-art results on several NLP tasks.https://www.kdnuggets.com/2019/07/xlnet-outperforms-bert-several-nlp-tasks.html
-
Scalable Python Code with Pandas UDFs: A Data Science Application
There is still a gap between the corpus of libraries that developers want to apply in a scalable runtime and the set of libraries that support distributed execution. This post discusses how to bridge this gap using the the functionality provided by Pandas UDFs in Spark 2.3+https://www.kdnuggets.com/2019/06/scalable-python-code-pandas-udfs.html
-
Overview of Different Approaches to Deploying Machine Learning Models in Production
Learn the different methods for putting machine learning models into production, and to determine which method is best for which use case.https://www.kdnuggets.com/2019/06/approaches-deploying-machine-learning-production.html
-
The Data Fabric for Machine Learning – Part 1">The Data Fabric for Machine Learning – Part 1
How the new advances in semantics and the data fabric can help us be better at Machine Learninghttps://www.kdnuggets.com/2019/05/data-fabric-machine-learning-part-1.html
-
Building Recommender systems with Azure Machine Learning service
Microsoft has provided a GitHub repository with Python best practice examples to facilitate the building and evaluation of recommendation systems using Azure Machine Learning services.https://www.kdnuggets.com/2019/05/recommender-systems-azure-machine-learning.html
-
The 3 Biggest Mistakes on Learning Data Science">The 3 Biggest Mistakes on Learning Data Science
Data science or whatever you want to call it is not just knowing some programming languages, math, statistics and have “domain knowledge” and here I show you why.https://www.kdnuggets.com/2019/05/biggest-mistakes-learning-data-science.html
-
XGBoost Algorithm: Long May She Reign
In recent years, XGBoost algorithm has gained enormous popularity in academic as well as business world. We outline some of the reasons behind this incredible success.https://www.kdnuggets.com/2019/05/xgboost-algorithm.html
-
An introduction to explainable AI, and why we need it
We introduce explainable AI, why it is needed, and present the Reversed Time Attention Model, Local Interpretable Model-Agnostic Explanation and Layer-wise Relevance Propagation.https://www.kdnuggets.com/2019/04/introduction-explainable-ai.html
-
How to Choose the Right Chart Type
This article presents an infographic for choosing which chart type is most useful in a given scenario. The infographic and chart types are then explored for greater clarity.https://www.kdnuggets.com/2019/03/how-choose-right-chart-type.html
-
Comparing MobileNet Models in TensorFlow
MobileNets are a family of mobile-first computer vision models for TensorFlow, designed to effectively maximize accuracy while being mindful of the restricted resources for an on-device or embedded application.https://www.kdnuggets.com/2019/03/comparing-mobilenet-models-tensorflow.html
-
Your AI skills are worth less than you think">Your AI skills are worth less than you think
We are in the middle of an AI boom. That doesn’t mean that making your AI startup succeed is easy. I think there are some important pitfalls ahead of anyone trying to build their business around AI.https://www.kdnuggets.com/2019/01/your-ai-skills-worth-less-than-you-think.html
-
Building AI to Build AI: The Project That Won the NeurIPS AutoML Challenge
This is an overview of designing a computer program capable of developing predictive models without any manual intervention that are trained & evaluated in a lifelong machine learning setting in NeurIPS 2018 AutoML3 Challenge.https://www.kdnuggets.com/2019/01/building-ai-to-build-ai-neurips-automl-challenge.html
-
Why You Shouldn’t be a Data Science Generalist">Why You Shouldn’t be a Data Science Generalist
But it’s hard to avoid becoming a generalist if you don’t know which common problem classes you could specialize in in the fist place. That’s why I put together a list of the five problem classes that are often lumped together under the “data science” heading.https://www.kdnuggets.com/2018/12/why-shouldnt-data-science-generalist.html
-
Best Machine Learning Languages, Data Visualization Tools, DL Frameworks, and Big Data Tools">Best Machine Learning Languages, Data Visualization Tools, DL Frameworks, and Big Data Tools
We cover a variety of topics, from machine learning to deep learning, from data visualization to data tools, with comments and explanations from experts in the relevant fields.https://www.kdnuggets.com/2018/12/machine-learning-data-visualization-deep-learning-tools.html
-
Intro to Data Science for Managers">Intro to Data Science for Managers
This mindmap contains a condensed introduction to the key data science concepts and techniques that have revolutionized the business landscape and became essential for making beneficial data-driven decisionshttps://www.kdnuggets.com/2018/11/intro-data-science-managers.html
-
Top 13 Python Deep Learning Libraries">Top 13 Python Deep Learning Libraries
Part 2 of a new series investigating the top Python Libraries across Machine Learning, AI, Deep Learning and Data Science.https://www.kdnuggets.com/2018/11/top-python-deep-learning-libraries.html
-
The Most in Demand Skills for Data Scientists">The Most in Demand Skills for Data Scientists
Data scientists are expected to know a lot — machine learning, computer science, statistics, mathematics, data visualization, communication, and deep learning. How should data scientists who want to be in demand by employers spend their learning budget?https://www.kdnuggets.com/2018/11/most-demand-skills-data-scientists.html
-
Things you should know when traveling via the Big Data Engineering hype-train
Maybe you want to join the Big Data world? Or maybe you are already there and want to validate your knowledge? Or maybe you just want to know what Big Data Engineers do and what skills they use? If so, you may find the following article quite useful.https://www.kdnuggets.com/2018/10/big-data-engineering-hype-train.html
-
Hadoop for Beginners">Hadoop for Beginners
An introduction to Hadoop, a framework that enables you to store and process large data sets in parallel and distributed fashion.https://www.kdnuggets.com/2018/09/hadoop-beginners.html
-
Multi-Class Text Classification with Scikit-Learn
The vast majority of text classification articles and tutorials on the internet are binary text classification such as email spam filtering and sentiment analysis. Real world problem are much more complicated than that.https://www.kdnuggets.com/2018/08/multi-class-text-classification-scikit-learn.html
-
Auto-Keras, or How You can Create a Deep Learning Model in 4 Lines of Code">Auto-Keras, or How You can Create a Deep Learning Model in 4 Lines of Code
Auto-Keras is an open source software library for automated machine learning. Auto-Keras provides functions to automatically search for architecture and hyperparameters of deep learning models.https://www.kdnuggets.com/2018/08/auto-keras-create-deep-learning-model-4-lines-code.html
-
Data Scientist Interviews Demystified">Data Scientist Interviews Demystified
We look at typical questions in a data science interview, examine the rationale for such questions, and hope to demystify the interview process for recent graduates and aspiring data scientists.https://www.kdnuggets.com/2018/08/data-scientist-interviews-demystified.html
-
DIY Deep Learning Projects">DIY Deep Learning Projects
Inspired by the great work of Akshay Bahadur in this article you will see some projects applying Computer Vision and Deep Learning, with implementations and details so you can reproduce them on your computer.https://www.kdnuggets.com/2018/06/diy-deep-learning-projects.html
-
The 6 components of Open-Source Data Science/ Machine Learning Ecosystem; Did Python declare victory over R?">The 6 components of Open-Source Data Science/ Machine Learning Ecosystem; Did Python declare victory over R?
We find 6 tools form the modern open source Data Science / Machine Learning ecosystem; examine whether Python declared victory over R; and review which tools are most associated with Deep Learning and Big Data.https://www.kdnuggets.com/2018/06/ecosystem-data-science-python-victory.html
-
A Beginner’s Guide to the Data Science Pipeline">A Beginner’s Guide to the Data Science Pipeline
On one end was a pipe with an entrance and at the other end an exit. The pipe was also labeled with five distinct letters: "O.S.E.M.N."https://www.kdnuggets.com/2018/05/beginners-guide-data-science-pipeline.html
-
9 Must-have skills you need to become a Data Scientist, updated">9 Must-have skills you need to become a Data Scientist, updated
Check out this collection of 9 (plus some additional freebies) must-have skills for becoming a data scientist.https://www.kdnuggets.com/2018/05/simplilearn-9-must-have-skills-data-scientist.html
-
Data Engineer vs Data Scientist: the evolution of aggressive species
This article looks at how the two "species" - data scientists and data engineers - harmonise and coexist.https://www.kdnuggets.com/2018/05/dsti-data-engineer-vs-data-scientist.html
-
Top 7 Data Science Use Cases in Finance
We have prepared a list of data science use cases that have the highest impact on the finance sector. They cover very diverse business aspects from data management to trading strategies, but the common thing for them is the huge prospects to enhance financial solutions.https://www.kdnuggets.com/2018/05/top-7-data-science-use-cases-finance.html
-
Detecting Breast Cancer with Deep Learning
Breast cancer is the most common invasive cancer in women, and the second main cause of cancer death in women, after lung cancer. In this article I will build a WideResNet based neural network to categorize slide images into two classes, one that contains breast cancer and other that doesn’t using Deep Learning Studio.https://www.kdnuggets.com/2018/05/detecting-breast-cancer-deep-learning.html
-
Data Science Interview Guide
Traditionally, Data Science would focus on mathematics, computer science and domain expertise. While I will briefly cover some computer science fundamentals, the bulk of this blog will mostly cover the mathematical basics one might either need to brush up on (or even take an entire course).https://www.kdnuggets.com/2018/04/data-science-interview-guide.html
-
How To Choose The Right Chart Type For Your Data
The power of charts to assist in accurate interpretation is massive and that's why it is vital to select the correct type when you are trying to visualize data.https://www.kdnuggets.com/2018/04/right-chart-your-data.html
-
A Day in the Life of a Data Scientist: Part 4
Interested in what a data scientist does on a typical day of work? Each data science role may be different, but these contributors have insight to help those interested in figuring out what a day in the life of a data scientist actually looks like.https://www.kdnuggets.com/2018/04/day-life-data-scientist-part-4.html
-
A “Weird” Introduction to Deep Learning">A “Weird” Introduction to Deep Learning
There are amazing introductions, courses and blog posts on Deep Learning. But this is a different kind of introduction.https://www.kdnuggets.com/2018/03/weird-introduction-deep-learning.html
-
5 Things You Need to Know about Sentiment Analysis and Classification">5 Things You Need to Know about Sentiment Analysis and Classification
We take a look at the important things you need to know about sentiment analysis, including social media, classification, evaluation metrics and how to visualise the results.https://www.kdnuggets.com/2018/03/5-things-sentiment-analysis-classification.html
-
Ranking Popular Distributed Computing Packages for Data Science
We examined 140 frameworks and distributed programing packages and came up with a list of top 20 distributed computing packages useful for Data Science, based on a combination of Github, Stack Overflow, and Google results.https://www.kdnuggets.com/2018/03/top-distributed-computing-packages-data-science.html
-
5 Things You Need to Know about Big Data">5 Things You Need to Know about Big Data
We take a look at five things you need to know about Big Data.https://www.kdnuggets.com/2018/03/5-things-big-data.html
-
A Guide to Hiring Data Scientists
This article provides a short overview of emerging data scientist types and their unique skillsets, as well as a guide for HR professionals and analytics managers who are looking to hire their first data scientists or build a data science team. Included are an overview of skills for each type and specific questions that can be asked to assess candidates.https://www.kdnuggets.com/2018/02/guide-hiring-data-scientists.html
-
Resurgence of AI During 1983-2010
We discuss supervised learning, unsupervised learning and reinforcement learning, neural networks, and 6 reasons that helped AI Research and Development to move ahead.https://www.kdnuggets.com/2018/02/resurgence-ai-1983-2010.html
-
Top 15 Scala Libraries for Data Science in 2018
For your convenience, we have prepared a comprehensive overview of the most important libraries used to perform machine learning and Data Science tasks in Scala.https://www.kdnuggets.com/2018/02/top-15-scala-libraries-data-science-2018.html
-
My Journey into Deep Learning
In this post I’ll share how I’ve been studying Deep Learning and using it to solve data science problems. It’s an informal post but with interesting content (I hope).https://www.kdnuggets.com/2018/01/journey-into-deep-learning.html
-
Supercharging Visualization with Apache Arrow
Interactive visualization of large datasets on the web has traditionally been impractical. Apache Arrow provides a new way to exchange and visualize data at unprecedented speed and scale.https://www.kdnuggets.com/2018/01/supercharging-visualization-apache-arrow.html
-
Simple Ways Of Working With Medium To Big Data Locally
An overview of the installation and implementation of simple techniques for working with large datasets in your machine.https://www.kdnuggets.com/2017/12/simple-medium-big-data-locally.html
-
Deep Learning Made Easy with Deep Cognition
So normally we do Deep Learning programming, and learning new APIs, some harder than others, some are really easy an expressive like Keras, but how about a visual API to create and deploy Deep Learning solutions with the click of a button? This is the promise of Deep Cognition.https://www.kdnuggets.com/2017/12/deep-learning-made-easy-deep-cognition.html
-
Data Science, Machine Learning: Main Developments in 2017 and Key Trends in 2018">Data Science, Machine Learning: Main Developments in 2017 and Key Trends in 2018
The leading experts in the field on the main Data Science, Machine Learning, Predictive Analytics developments in 2017 and key trends in 2018.https://www.kdnuggets.com/2017/12/data-science-machine-learning-main-developments-trends.html
-
Big Data: Main Developments in 2017 and Key Trends in 2018">Big Data: Main Developments in 2017 and Key Trends in 2018
As we bid farewell to one year and look to ring in another, KDnuggets has solicited opinions from numerous Big Data experts as to the most important developments of 2017 and their 2018 key trend predictions.https://www.kdnuggets.com/2017/12/big-data-main-developments-2017-key-trends-2018.html
-
Graph Analytics Using Big Data
An overview and a small tutorial showing how to analyze a dataset using Apache Spark, graphframes, and Java.https://www.kdnuggets.com/2017/12/graph-analytics-using-big-data.html
-
The 10 Statistical Techniques Data Scientists Need to Master">The 10 Statistical Techniques Data Scientists Need to Master
The author presents 10 statistical techniques which a data scientist needs to master. Build up your toolbox of data science tools by having a look at this great overview post.https://www.kdnuggets.com/2017/11/10-statistical-techniques-data-scientists-need-master.html
-
A Day in the Life of a Data Scientist">A Day in the Life of a Data Scientist
Are you interested in what a data scientist does on a typical day of work? Each data science role may be different, but these five individuals provide insight to help those interested in figuring out what a day in the life of a data scientist actually looks like.https://www.kdnuggets.com/2017/11/day-life-data-scientist.html
-
XGBoost: A Concise Technical Overview">XGBoost: A Concise Technical Overview
Interested in learning the concepts behind XGBoost, rather than just using it as a black box? Or, are you looking for a concise introduction to XGBoost? Then, this article is for you. Includes a Python implementation and links to other basic Python and R codes as well.https://www.kdnuggets.com/2017/10/xgboost-concise-technical-overview.html
-
How LinkedIn Makes Personalized Recommendations via Photon-ML Machine Learning tool">How LinkedIn Makes Personalized Recommendations via Photon-ML Machine Learning tool
In this article we focus on the personalization aspect of model building and explain the modeling principle as well as how to implement Photon-ML so that it can scale to hundreds of millions of users.https://www.kdnuggets.com/2017/10/linkedin-personalized-recommendations-photon-ml.html
-
Introducing R-Brain: A New Data Science Platform
R-Brain is a next generation platform for data science built on top of Jupyterlab with Docker, which supports not only R, but also Python, SQL, has integrated intellisense, debugging, packaging, and publishing capabilities.https://www.kdnuggets.com/2017/10/r-brain-new-data-science-platform.html
-
XGBoost, a Top Machine Learning Method on Kaggle, Explained">XGBoost, a Top Machine Learning Method on Kaggle, Explained
Looking to boost your machine learning competitions score? Here’s a brief summary and introduction to a powerful and popular tool among Kagglers, XGBoost.https://www.kdnuggets.com/2017/10/xgboost-top-machine-learning-method-kaggle-explained.html
-
The new Enigma Public – the platform connecting people to data
Public data has tremendous potential and different people can use it to solve variety of problems. Enigma relaunches Enigma Public — the platform connecting people to data.https://www.kdnuggets.com/2017/09/new-enigma-public-platform.html
-
How To Write Better SQL Queries: The Definitive Guide – Part 1
Most forget that SQL isn’t just about writing queries, which is just the first step down the road. Ensuring that queries are performant or that they fit the context that you’re working in is a whole other thing. This SQL tutorial will provide you with a small peek at some steps that you can go through to evaluate your query.https://www.kdnuggets.com/2017/08/write-better-sql-queries-definitive-guide-part-1.html
-
Recommendation System Algorithms: An Overview
This post presents an overview of the main existing recommendation system algorithms, in order for data scientists to choose the best one according a business’s limitations and requirements.https://www.kdnuggets.com/2017/08/recommendation-system-algorithms-overview.html
-
Lessons Learned From Benchmarking Fast Machine Learning Algorithms
Boosted decision trees are responsible for more than half of the winning solutions in machine learning challenges hosted at Kaggle, and require minimal tuning. We evaluate two popular tree boosting software packages: XGBoost and LightGBM and draw 4 important lessons.https://www.kdnuggets.com/2017/08/lessons-benchmarking-fast-machine-learning-algorithms.html
-
First Steps of Learning Deep Learning: Image Classification in Keras
Whether you want to start learning deep learning for you career, to have a nice adventure (e.g. with detecting huggable objects) or to get insight into machines before they take over, this post is for you!https://www.kdnuggets.com/2017/08/first-steps-learning-deep-learning-image-classification-keras.html
-
Going deeper with recurrent networks: Sequence to Bag of Words Model
Deep learning makes it possible to convert unstructured text to computable formats, incorporating semantic knowledge to train machine learning models. These digital data troves help us understand people on a new level.https://www.kdnuggets.com/2017/08/deeper-recurrent-networks-sequence-bag-words-model.html
-
How Feature Engineering Can Help You Do Well in a Kaggle Competition – Part I
As I scroll through the leaderboard page, I found my name in the 19th position, which was the top 2% from nearly 1,000 competitors. Not bad for the first Kaggle competition I had decided to put a real effort in!https://www.kdnuggets.com/2017/06/feature-engineering-help-kaggle-competition-1.html
-
Your Checklist to Get Data Science Implemented in Production
For over a year we surveyed thousands of companies from all types of industries and data science advancement on how they managed to overcome these difficulties and analyzed the results. Here are the key things to keep in mind when you're working on your design-to-production pipeline.https://www.kdnuggets.com/2017/06/dataiku-checklist-data-science-implemented-production.html
-
5 Machine Learning Projects You Can No Longer Overlook, May
In this month's installment of Machine Learning Projects You Can No Longer Overlook, we find some data preparation and exploration tools, a (the?) reinforcement learning "framework," a new automated machine learning library, and yet another distributed deep learning library.https://www.kdnuggets.com/2017/05/five-machine-learning-projects-cant-overlook-may.html
-
42 Essential Quotes by Data Science Thought Leaders
42 illuminating quotes you need to read if you’re a data scientist or considering a career in the field – insights from industry experts tackling the tough questions that every data scientist faces.https://www.kdnuggets.com/2017/05/42-essential-quotes-data-science-thought-leaders.html
-
Dask and Pandas and XGBoost: Playing nicely between distributed systems
This blogpost gives a quick example using Dask.dataframe to do distributed Pandas data wrangling, then using a new dask-xgboost package to setup an XGBoost cluster inside the Dask cluster and perform the handoff.https://www.kdnuggets.com/2017/04/dask-pandas-xgboost-playing-nicely-distributed-systems.html
-
What Top Firms Ask: 100+ Data Science Interview Questions
Check this out: A topic wise collection of 100+ data science interview questions from top companies.https://www.kdnuggets.com/2017/03/top-firms-100-data-science-interview-questions.html
-
Machine Learning-driven Firewall
Cyber Security is always a hot topic in IT industry and machine learning is making security systems more stronger. Here, a particular use case of machine learning in cyber security is explained in detail.https://www.kdnuggets.com/2017/02/machine-learning-driven-firewall.html
-
Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory
Apache Parquet and Apache Arrow both focus on improving performance and efficiency of data analytics. These two projects optimize performance for on disk and in-memory processinghttps://www.kdnuggets.com/2017/02/apache-arrow-parquet-columnar-data.html
-
Why the Data Scientist and Data Engineer Need to Understand Virtualization in the Cloud
This article covers the value of understanding the virtualization constructs for the data scientist and data engineer as they deploy their analysis onto all kinds of cloud platforms. Virtualization is a key enabling layer of software for these data workers to be aware of and to achieve optimal results from.https://www.kdnuggets.com/2017/01/data-scientist-engineer-understand-virtualization-cloud.html
-
How To Stay Competitive In Machine Learning Business
To stay competitive in machine learning business, you have to be superior than your rivals and not the best possible – says one of the leading machine learning expert. Simple rules are defined here to make that happen. Let’s see how.https://www.kdnuggets.com/2017/01/stay-competitive-machine-learning-business.html
-
Continuous improvement for IoT through AI / Continuous learning">Continuous improvement for IoT through AI / Continuous learning
In reality, especially for IoT, it is not like once an analytics model is built, it will give the results with same accuracy till the end of time. Data pattern changes over the time which makes it absolutely important to learn from new data and improve/recalibrate the models to get correct result. Below article explain this phenomenon of continuous improvement in analytics for IoT.https://www.kdnuggets.com/2016/11/continuous-improvement-iot-ai-learning.html
-
Data Avengers… Assemble!
The Avengers are perfectly capable of defending the Earth from our worst enemies. But are they up to the task of taking care of our data? Read this terribly punny "opinion" piece to find out.https://www.kdnuggets.com/2016/11/data-avengers-assemble.html
-
Introduction to Trainspotting: Computer Vision, Caltrain, and Predictive Analytics
We previously analyzed delays using Caltrain’s real-time API to improve arrival predictions, and we have modeled the sounds of passing trains to tell them apart. In this post we’ll start looking at the nuts and bolts of making our Caltrain work possible.https://www.kdnuggets.com/2016/11/introduction-trainspotting.html
-
MLDB: The Machine Learning Database
MLDB is an opensource database designed for machine learning. Send it commands over a RESTful API to store data, explore it using SQL, then train machine learning models and expose them as APIs.https://www.kdnuggets.com/2016/10/mldb-machine-learning-database.html
-
Introducing Dask for Parallel Programming: An Interview with Project Lead Developer
Introducing Dask, a flexible parallel computing library for analytics. Learn more about this project built with interactive data science in mind in an interview with its lead developer.https://www.kdnuggets.com/2016/09/introducing-dask-parallel-programming.html
-
The top 5 Big Data courses to help you break into the industry
Here is an updated and in-depth review of top 5 providers of Big Data and Data Science courses: Simplilearn, Cloudera, Big Data University, Hortonworks, and Courserahttps://www.kdnuggets.com/2016/08/simplilearn-5-big-data-courses.html
-
Big Data Key Terms, Explained
Just getting started with Big Data, or looking to iron out the wrinkles in your current understanding? Check out these 20 Big Data-related terms and their concise definitions.https://www.kdnuggets.com/2016/08/big-data-key-terms-explained.html
-
Dataiku DSS 3.1 – Now with 5 ML Backends & Scala!
Introducing Dataiku DSS 3.1, with new visual machine learning engines that allow users to create incredibly powerful predictive applications within a code-free interface.https://www.kdnuggets.com/2016/08/dataiku-dss-31-machine-learning-backends-scala.html
-
Statistical Data Analysis in Python
This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects, taking the form of a set of IPython notebooks.https://www.kdnuggets.com/2016/07/statistical-data-analysis-python.html
-
The Big Data Ecosystem is Too Damn Big">The Big Data Ecosystem is Too Damn Big
The Big Data ecosystem is just too damn big! It's complex, redundant, and confusing. There are too many layers in the technology stack, too many standards, and too many engines. Vendors? Too many. What is the user to do?https://www.kdnuggets.com/2016/06/big-data-ecosystem-too-damn-big.html
-
Top 10 IPython Notebook Tutorials for Data Science and Machine Learning
A list of 10 useful Github repositories made up of IPython (Jupyter) notebooks, focused on teaching data science and machine learning. Python is the clear target here, but general principles are transferable.https://www.kdnuggets.com/2016/04/top-10-ipython-nb-tutorials.html
-
Top 15 Frameworks for Machine Learning Experts
Either you are a researcher, start-up or big organization who wants to use machine learning, you will need the right tools to make it happen. Here is a list of the most popular frameworks for machine learning.https://www.kdnuggets.com/2016/04/top-15-frameworks-machine-learning-experts.html
-
Deep Learning for Internet of Things Using H2O
H2O is feature-rich open source machine learning platform known for its R and Spark integration and it’s ease of use. This is an overview of using H2O deep learning for data science with the Internet of Things.https://www.kdnuggets.com/2016/04/deep-learning-iot-h2o.html
-
Top 10 Data Science Resources on Github
The top 10 data science projects on Github are chiefly composed of a number of tutorials and educational resources for learning and doing data science. Have a look at the resources others are using and learning from.https://www.kdnuggets.com/2016/03/top-10-data-science-github.html
-
New KDnuggets Tutorials Page: Learn R, Python, Data Visualization, Data Science, and more
Introducing new KDnuggets Tutorials page with useful resources for learning about Business Analytics, Big Data, Data Science, Data Mining, R, Python, Data Visualization, Spark, Deep Learning and more.https://www.kdnuggets.com/2016/03/new-tutorials-section-r-python-data-visualization-data-science.html
-
Top February stories: 21 Must-Know Data Science Interview Q&A; Gartner 2016 MQ for Advanced Analytics: gainers and losers
21 Must-Know Data Science Interview Questions and Answers; Top 10 TED Talks for the Data Scientists; Gartner 2016 Magic Quadrant for Advanced Analytics Platforms: gainers and losers.https://www.kdnuggets.com/2016/03/top-news-2016-feb.html
-
21 Must-Know Data Science Interview Questions and Answers">21 Must-Know Data Science Interview Questions and Answers
KDnuggets Editors bring you the answers to 20 Questions to Detect Fake Data Scientists, including what is regularization, Data Scientists we admire, model validation, and more.https://www.kdnuggets.com/2016/02/21-data-science-interview-questions-answers.html
-
Top 10 Machine Learning Projects on Github">Top 10 Machine Learning Projects on Github
The top 10 machine learning projects on Github include a number of libraries, frameworks, and education resources. Have a look at the tools others are using, and the resources they are learning from.https://www.kdnuggets.com/2015/12/top-10-machine-learning-github.html
-
Deep Learning for Visual Question Answering
Here we discuss about the Visual Question Answering problem, and I’ll also present neural network based approaches for same.https://www.kdnuggets.com/2015/11/deep-learning-visual-question-answering.html
-
5 Best Machine Learning APIs for Data Science
Machine Learning APIs make it easy for developers to develop predictive applications. Here we review 5 important Machine Learning APIs: IBM Watson, Microsoft Azure Machine Learning, Google Prediction API, Amazon Machine Learning API, and BigML.https://www.kdnuggets.com/2015/11/machine-learning-apis-data-science.html
-
Open Source Enabled Interactive Analytics: An Overview
Explaining the aspects of creating an interactive data driven dashboard using open source technologies i.e. MongoDB, D3.Js, DC.JS and Node JS.https://www.kdnuggets.com/2015/06/open-source-interactive-analytics-overview.html
-
Popular Deep Learning Tools – a review
Deep Learning is the hottest trend now in AI and Machine Learning. We review the popular software for Deep Learning, including Caffe, Cuda-convnet, Deeplearning4j, Pylearn2, Theano, and Torch.https://www.kdnuggets.com/2015/06/popular-deep-learning-tools.html
-
Which Big Data, Data Mining, and Data Science Tools go together?
We analyze the associations between the top Big Data, Data Mining, and Data Science tools based on the results of 2015 KDnuggets Software Poll. Download anonymized data and analyze it yourself.https://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html
-
Talking Machine – 3 Deep Learning Gurus Talk about History and Future of Machine Learning, part 1
An recent interview from the talking machine podcast with three deep learning experts. They talked about the neural network winter and its renewal.https://www.kdnuggets.com/2015/03/talking-machine-deep-learning-gurus-p1.html
-
Interview: Arno Candel, H2O.ai on the Basics of Deep Learning to Get You Started
We discuss how Deep Learning is different from the other methods of Machine Learning, unique characteristics and benefits of Deep Learning, and the key components of H2O architecture.https://www.kdnuggets.com/2015/01/interview-arno-candel-0xdata-deep-learning.html
-
IE Masters in Analytics and Big Data – first hand report
First hand report on Master in business analytics and big data program at IE (Madrid, Spain) - why, what, how, days, and challenges.https://www.kdnuggets.com/2015/01/ie-data-science-education-first-hand-report.html
-
KDnuggets™ News 14:n32, Dec 3
Features | Software | Opinions | News | Webcasts | Courses | Meetings | Jobs | Academic | Publications | Tweets | CFP | Quote Read more »https://www.kdnuggets.com/2014/n32.html
-
KDnuggets™ News 14:n31, Nov 25
Features | Opinions | Interviews | Reports | News | Webcasts | Jobs | Academic | Publications | Tweets | CFP | Quote Features Update: Read more »https://www.kdnuggets.com/2014/n31.html
-
KDnuggets™ News 14:n30, Nov 19
Features | Software | Opinions | Interviews | Reports | News | Webcasts | Courses | Meetings | Jobs | Academic | Publications | Tweets Read more »https://www.kdnuggets.com/2014/n30.html
-
KDnuggets™ News 14:n29, Nov 5
Features | Software | Opinions | News | Webcasts | Courses | Meetings | Jobs | Publications | Tweets | CFP | Quote Features Big Read more »https://www.kdnuggets.com/2014/n29.html
-
KDnuggets™ News 14:n28, Oct 29
Features | Software | Opinions | Reports | News | Webcasts | Courses | Meetings | Jobs | Academic | Publications | Tweets | CFP Read more »https://www.kdnuggets.com/2014/n28.html
-
KDnuggets™ News 14:n26, Oct 8
Features | Software | Opinions | Interviews | News | Webcasts | Courses | Meetings | Jobs | Academic | Publications | Tweets | CFP Read more »https://www.kdnuggets.com/2014/n26.html
-
KDnuggets™ News 14:n22, Aug 20
Features | News | Opinions | Interviews | Webcasts | Courses | Meetings | Jobs | Publications | Tweets | Quote Features Four main languages Read more »https://www.kdnuggets.com/2014/n22.html
-
OpenML: Share, Discover and Do Machine Learning
OpenML is designed to share, organize and reuse data, code and experiments, so that scientists can make discoveries more efficiently. It is an interesting idea to build a network of machine learning.https://www.kdnuggets.com/2014/08/openml-share-discover-do-machine-learning.html
-
KDnuggets™ News 14:n19, Jul 30
Features | Software | News | Opinions | Interviews | Reports | Webcasts | Courses | Meetings | Jobs | Academic | Publications | Tweets Read more »https://www.kdnuggets.com/2014/n19.html
-
KDnuggets™ News 14:n18, Jul 16
Features (6) | Software (2) | Opinions (7) | News (4) | Webcasts (3) | Courses (1) | Meetings (4) | Jobs (9) | Tweets Read more »https://www.kdnuggets.com/2014/n18.html
-
KDnuggets™ News 14:n14, Jun 10
Features (8) | Software (3) | Opinions (14) | News (6) | Webcasts (3) | Courses (1) | Meetings and Reports (9) | Jobs (6) Read more »https://www.kdnuggets.com/2014/n14.html
-
KDnuggets™ News 14:n10, Apr 30
Features (9) | Opinions (5) | Software (3) | News (6) | Webcasts (1) | Courses (3) | Meetings (4) | Jobs (10) | Academic Read more »https://www.kdnuggets.com/2014/n10.html