2018 Aug
All (100) | Courses, Education (3) | Meetings (13) | News, Features (11) | Opinions, Interviews (15) | Top Stories, Tweets (9) | Tutorials, Overviews (43) | Webcasts & Webinars (6)
- Three Ways Big Data and Machine Learning Reinvent Online Video Experience - Aug 31, 2018.
With traditional TV viewing on the decline, we discuss several ways Big Data and Machine Learning can assist with online video, including redefining user recommendations, improving video buffering and leveraging MAM orchestration.
-
AI Knowledge Map: How To Classify AI Technologies - Aug 31, 2018.
What follows is then an effort to draw an architecture to access knowledge on AI and follow emergent dynamics, a gateway of pre-existing knowledge on the topic that will allow you to scout around for additional information and eventually create new knowledge on AI. - Register for Chief Analytics Officer, Fall in Boston, Oct 8-11 – join data and analytics leaders - Aug 30, 2018.
Join data and analytics leaders at CAO Fall in Boston, Oct 8-11, the platform to guide you through transformation and help you innovate within your business. KDnuggets readers save $100 on your pass using discount code KDNUGGETS100.
- Self-Service Data Prep Tools vs Enterprise-Level Solutions? 6 Lessons Learned - Aug 30, 2018.
A detailed comparison between self-service data preparation tools and enterprise-level solutions, covering business strategy, accessible tools and solutions and more.
- Optimus v2: Agile Data Science Workflows Made Easy - Aug 30, 2018.
Looking for a library to skyrocket your productivity as Data Scientist? Check this out!
-
Topic Modeling with LSA, PLSA, LDA & lda2Vec - Aug 30, 2018.
This article is a comprehensive overview of Topic Modeling and its associated techniques. - Top KDnuggets tweets, Aug 22-28: AI Knowledge Map: How To Classify AI Technologies; 100 Days of #MachineLearning Coding with #Python - Aug 29, 2018.
Also 25 fun questions for a machine learning interview; Data Visualization Cheat Sheet
- Skip the Interview! 9 Benefits of Career Fairs - Aug 29, 2018.
Career fairs are a great way to get your feet wet if you’re just starting your data science career, or to be exposed to newer trends and emerging organizations if you’re already established. What other ways are career fairs beneficial?
- Word Vectors in Natural Language Processing: Global Vectors (GloVe) - Aug 29, 2018.
A well-known model that learns vectors or words from their co-occurrence information is GlobalVectors (GloVe). While word2vec is a predictive model — a feed-forward neural network that learns vectors to improve the predictive ability, GloVe is a count-based model.
- Deploying scikit-learn Models at Scale - Aug 29, 2018.
Find out how to serve your scikit-learn model in an auto-scaling, serverless environment! Today, we’ll take a trained scikit-learn model and deploy it on Cloud ML Engine.
- Learn from the experts at Google Brain, UC Berkley, Adobe Research & FAIR - Aug 28, 2018.
The World's Biggest Deep Learning Summit is returning to San Francisco in January 2019. Use code SUMMER for an additional 25% off the Super Early Bird Ticket rate by September 7.
- Top Considerations for Selecting a Real-time Streaming Analytics Platform - Aug 28, 2018.
Information on how to download this whitepaper, which provides a view into how streaming data analytics is different from traditional analytics and thus have unique data processing needs that translate into absolute must-haves for the streaming analytics platform.
-
Linear Regression In Real Life - Aug 28, 2018.
A helpful guide to Linear Regression, using an example of a friends road trip to Las Vegas to highlight how it can be used in a real life situation. - How to Make Your Machine Learning Models Robust to Outliers - Aug 28, 2018.
In this blog, we’ll try to understand the different interpretations of this “distant” notion. We will also look into the outlier detection and treatment techniques while seeing their impact on different types of machine learning models.
- Are Vectorized Random Number Generators Actually Useful? - Aug 28, 2018.
I reported that you can multiply the speed of common (fast) random number generators such as PCG and xorshift128+ by a factor of three or four by vectorizing them using SIMD instructions. Is this actually useful in practice?
- Nvidia: AI Training for Self-Driving Vehicles [On-demand Webinar] - Aug 27, 2018.
We discuss the key considerations in selecting the optimal AI infrastructure required to train deep neural networks for safe self-driving systems, including data requirements and computing performance needed, and how to use NVIDIA DGX-1 for training autonomous vehicles.
- The Deadly Dozen & Data Science for Managers – Outstanding Workshops at Predictive Analytics World for Government - Aug 27, 2018.
Predictive Analytics World for Government, Sep 18-19, Washington DC, is a practically-focused, vendor neutral conference that highlights case studies and emerging trends of how government agencies are currently using data analytics to solve real world problems.
- Top Stories, Aug 20-26: Data Visualization Cheat Sheet; Comparison of the Most Useful Text Processing APIs - Aug 27, 2018.
Also: Why Automated Feature Engineering Will Change the Way You Do Machine Learning; Interpreting a data set, beginning to end; Auto-Keras, or How You can Create a Deep Learning Model in 4 Lines of Code; Emotion and Sentiment Analysis: A Practitioners Guide to NLP
- What Data Scientists Want? - Aug 27, 2018.
We examine what's important for data scientists in their careers, including challenging work, networking with peers, foreseeing their career path and creating a good work-life balance.
- Multi-Class Text Classification with Scikit-Learn - Aug 27, 2018.
The vast majority of text classification articles and tutorials on the internet are binary text classification such as email spam filtering and sentiment analysis. Real world problem are much more complicated than that.
- Analyze, engineer, design: Do it all with Dash - Aug 24, 2018.
Open-source Dash lets you wrap a GUI around that analytical code, without leaving the familiarity of Python. Explore your data with rich, interactive drop-down menus, sliders, and other components, all in the web browser.
-
Data Visualization Cheat Sheet - Aug 24, 2018.
Core principles for successful data visualization, including tips on how to reduce clutter, preattentive processing and how to integrate text within the graph. - Emotion and Sentiment Analysis: A Practitioner’s Guide to NLP, by Dipanjan Sarkar - Aug 24, 2018.
Sentiment analysis is widely used, especially as a part of social media analysis for any domain, be it a business, a recent movie, or a product launch, to understand its reception by the people and what they think of it based on their opinions or, you guessed it, sentiment!
- The 2018 Data Scientist Report is Here - Aug 23, 2018.
Learn about the data and tools that data scientists are working with in 2018, Ethical issues around AI, Algorithmic bias, Job satisfaction, and more.
- DynamoDB vs. Cassandra: from “no idea” to “it’s a no-brainer” - Aug 23, 2018.
DynamoDB vs. Cassandra: have they got anything in common? If yes, what? If no, what are the differences? We answer these questions and examine performance of both databases.
-
Comparison of the Most Useful Text Processing APIs - Aug 23, 2018.
There is a need to compare different APIs to understand key pros and cons they have and when it is better to use one API instead of the other. Let us proceed with the comparison. - Top KDnuggets tweets, Aug 15-21: How to Set Up a Free Data Science Environment on Google Cloud - Aug 22, 2018.
Also: Unveiling Mathematics Behind XGBoost; Causation in a Nutshell; Introduction to Fraud Detection Systems.
- Stanford online Data Science, Data Mining courses and certificates - Aug 22, 2018.
With Stanford online graduate courses and certificates, you can earn a higher education credential while still maintaining your career. Apply now!
- 9 Things You Should Know About TensorFlow - Aug 22, 2018.
A summary of the key points from the Google Cloud Next in San Francisco, "What’s New with TensorFlow?", including neural networks, TensorFlow Lite, data pipelines and more.
- Leveraging Agent-based Models (ABM) and Digital Twins to Prevent Injuries - Aug 22, 2018.
Both athletes and machines deal with inter-twined complex systems (where the interactions of one complex system can have a ripple effect on others) that can have significant impact on their operational effectiveness.
- The future of Big Data, Machine Learning and Data Visualization in Europe - Aug 21, 2018.
Learn more about the hottest trends that are shaping the future and beyond at Big Data Summits in London and Barcelona. Deep dive into the topics that will shake up your industry and encourage innovation at your company. Enjoy £250 off all two-day events with code KD250.
- Docker Cheat Sheet - Aug 21, 2018.
This comprehensive cheat sheet will assist Docker users, experienced and new, in getting containers up-and-running quickly. We list commands that will allow users to install, build, ship and run Docker containers.
- UX Design Guide for Data Scientists and AI Products - Aug 21, 2018.
Realizing that there is a legitimate knowledge gap between UX Designers and Data Scientists, I have decided to attempt addressing the needs from the Data Scientist’s perspective.
- Basic Statistics in Python: Probability - Aug 21, 2018.
At the most basic level, probability seeks to answer the question, "What is the chance of an event happening?" To calculate the chance of an event happening, we also need to consider all the other events that can occur.
- Top Stories, Aug 13-19: Data Scientist guide for getting started with Docker; Auto-Keras, or How You can Create a Deep Learning Model in 4 Lines of Code - Aug 20, 2018.
Also: Unveiling Mathematics Behind XGBoost; Project Hydrogen, new initiative based on Apache Spark to support AI and Data Science.
- One-Click Machine Learning Deployments with Anaconda Enterprise - Aug 20, 2018.
With Anaconda Enterprise, your organization can develop, govern, and automate machine learning pipelines, while scaling with ease.
- Interpreting a data set, beginning to end - Aug 20, 2018.
Detailed knowledge of your data is key to understanding it! We review several important methods that to understand the data, including summary statistics with visualization, embedding methods like PCA and t-SNE, and Topological Data Analysis.
- Why Automated Feature Engineering Will Change the Way You Do Machine Learning - Aug 20, 2018.
Automated feature engineering will save you time, build better predictive models, create meaningful features, and prevent data leakage.
- Cartoon: Machine Learning takes a vacation - Aug 18, 2018.
August is a popular time for vacation, and even hard-working AI may want to take a few epochs off from its training. KDnuggets Cartoon looks at how this might go.
- AnalyticsX Hackathon: Seize the data! - Aug 17, 2018.
Join us for the AnalyticsX Hackathon, Sep 16-17 in San Diego, where the data is hot and the discoveries are cool. Reserve your seat now!
- Introduction to Fraud Detection Systems - Aug 17, 2018.
Using the Python gradient boosting library LightGBM, this article introduces fraud detection systems, with code samples included to help you get started.
-
Auto-Keras, or How You can Create a Deep Learning Model in 4 Lines of Code - Aug 17, 2018.
Auto-Keras is an open source software library for automated machine learning. Auto-Keras provides functions to automatically search for architecture and hyperparameters of deep learning models. - Named Entity Recognition: A Practitioner’s Guide to NLP - Aug 17, 2018.
Named entity recognition (NER) , also known as entity chunking/extraction , is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.
- Data Science at Northwestern - Aug 16, 2018.
Northwestern’s MASTER OF SCIENCE IN DATA SCIENCE is a fully online, part-time program that helps students build essential analysis and leadership skills for today's data-driven world. Apply now!
- John Elder at Predictive Analytics World London – Save with the Early Bird price until 24 August - Aug 16, 2018.
The deadline to save up to £300 with Early Bird Prices for Predictive Analytics World in London October 17-18 is fast approaching! Book now to save your spot.
- Machine Learning with TensorFlow - Aug 16, 2018.
In this on-demand webinar, you’ll get a general introduction to working with Tensorflow and its surrounding ecosystem, general problem classes, where you can get big acceleration, and why you should be running on a CPU.
- Project Hydrogen, new initiative based on Apache Spark to support AI and Data Science - Aug 16, 2018.
An introduction to Project Hydrogen: how it can assist machine learning and AI frameworks on Apache Spark and what distinguishes it from other open source projects.
- Reinforcement Learning: The Business Use Case, Part 2 - Aug 16, 2018.
In this post, I will explore the implementation of reinforcement learning in trading. The Financial industry has been exploring the applications of Artificial Intelligence and Machine Learning for their use-cases, but the monetary risk has prompted reluctance.
- A Crash Course in MXNet Tensor Basics & Simple Automatic Differentiation - Aug 16, 2018.
This is an overview of some basic functionality of the MXNet ndarray package for creating tensor-like objects, and using the autograd package for performing automatic differentiation.
- Top KDnuggets tweets, Aug 1-14: Basic Statistics in Python; Essential Command Line Tools for Data Scientists - Aug 15, 2018.
Basic Statistics in Python: Descriptive Statistics; Top 12 Essential Command Line Tools for Data Scientists; WTF is a Tensor?!?; How GOAT Taught a Machine to Love Sneakers;
- Open Data Science West Schedule Live, Europe Keynotes, and India Selling Out - Aug 15, 2018.
Check schedule for ODSC West (Oct 31 - Nov 3), fantastic keynotes for ODSC Europe (Sep 19-22), and get last remaining tix for ODSC India, Aug 30 - Sep 3.
- An Introduction to t-SNE with Python Example - Aug 15, 2018.
In this post we’ll give an introduction to the exploratory and visualization t-SNE algorithm. t-SNE is a powerful dimension reduction and visualization technique used on high dimensional data.
- AI-driven Insurance – Insights from AXA and Generali - Aug 15, 2018.
How do you move AI from proof of concept to core business today to demonstrate ROI? Read this whitepaper to find out.
- AutoKeras: The Killer of Google’s AutoML - Aug 15, 2018.
Auto-Keras is an open source "competitor" to Google’s AutoML, a new cloud software suite of Machine Learning tools. It’s based on Google’s state-of-the-art research in Neural Architecture Search (NAS).
- How to Set Up a Free Data Science Environment on Google Cloud - Aug 15, 2018.
In this post, we'll walk through how to set up a data science environment on Google Cloud Platform (GCP). Because of the economy of scale that cloud hosting companies provide, individuals or teams can affordably access powerful computers.
- Better Analytics for the Product Experience – Aug 21 webinar - Aug 14, 2018.
Learn a process for discovering the data and analytics needs of your users using user stories, use cases and mapping to data sources; Strategies for balancing priorities and managing expectations, and more.
- ebook: Using Deep Learning to Solve Real-World Problems - Aug 14, 2018.
Read this eBook to learn: How deep learning enables image classification, sentiment analysis, and other advanced analysis techniques and get a a starter workflow for building and training deep learning models.
-
Data Scientist guide for getting started with Docker - Aug 14, 2018.
Docker is an increasingly popular way to create and deploy applications through virtualization, but can it be useful for data scientists? This guide should help you quickly get started. - Solve epileptic seizure prediction! Participate at epilepsyecosystem.org - Aug 14, 2018.
Around twenty million people worldwide suffer from drug-resistant epilepsy and the unpredictability of seizures is one of the major factors affecting the quality of life of people with epilepsy.
-
Unveiling Mathematics Behind XGBoost - Aug 14, 2018.
Follow me till the end, and I assure you will atleast get a sense of what is happening underneath the revolutionary machine learning model. - The Future of Data Affects the Whole Team – TDWI Orlando, Nov 11-16 - Aug 13, 2018.
Eliminate Weak Links When You Bring Your Team to Orlando! Super Early Bird Deadline: September 14 - Save up to $915 with code KD20
- Setting up your AI Dev Environment in 5 Minutes - Aug 13, 2018.
Whether you're a novice data science enthusiast setting up TensorFlow for the first time, or a seasoned AI engineer working with terabytes of data, getting your libraries, packages, and frameworks installed is always a struggle. Learn how datmo, an open source python package, helps you get started in minutes.
- Unsupervised Learning Demystified - Aug 13, 2018.
Unsupervised learning is a pattern-finding technique for mining inspiration from your data. Let's demystify!
- Top Stories, Aug 6-12: Eight iconic examples of data visualisation; GitHub Python Data Science Spotlight - Aug 13, 2018.
Also: Only Numpy: Implementing GANs and Adam Optimizer using Numpy; Understanding Language Syntax and Structure; Eight iconic examples of data visualisation; 5 Data Science Projects That Will Get You Hired in 2018; Seven Practical Ideas For Beginner Data Scientists
- The AI Conference in San Francisco, Sep 4-7: Win KDnuggets Free Pass - Aug 11, 2018.
Win KDnuggets pass to AI Conference in San Francisco, where you'll join the leading minds in AI: Kai-Fu Lee, Meredith Whittaker, Peter Norvig, Dawn Song, David Patterson, Huma Abidi, Matt Wood, and more. Enter by Aug 18.
- Affordable online news archives for academic research - Aug 10, 2018.
Many researchers need access to multi-year historical repositories of online news articles. We identified three companies that make such access affordable, and spoke with their CEOs.
-
Understanding Language Syntax and Structure: A Practitioner’s Guide to NLP - Aug 10, 2018.
Knowledge about the structure and syntax of language is helpful in many areas like text processing, annotation, and parsing for further operations such as text classification or summarization. - The Essential Guide to Training Data for Machine Learning - Aug 9, 2018.
Download Figure Eight's new ebook, The Essential Guide to Training Data, and you'll learn about the advantages of using more data, the differences between having lots of big data and having labeled data, and some great open datasets to bootstrap your model.
-
Top 10 roles in AI and data science - Aug 9, 2018.
When you think of the perfect data science team, are you imagining 10 copies of the same professor of computer science and statistics, hands delicately stained with whiteboard marker? We hope not! - Building Reliable Machine Learning Models with Cross-validation - Aug 9, 2018.
Cross-validation is frequently used to train, measure and finally select a machine learning model for a given dataset because it helps assess how the results of a model will generalize to an independent data set in practice.
- Reinforcement Learning: The Business Use Case, Part 1 - Aug 9, 2018.
At base, RL is a complex algorithm for mapping observed entities and measures into some set of actions, while optimizing for a long-term or short-term reward.
- AI and ML Day in Australia with Alteryx, Tableau, Amazon, Snowflake, Commonwealth Bank, and IAPA - Aug 8, 2018.
Key information regarding The Alteryx Analytics Revolution Summit roadshow in Australia, including dates, guest speakers, livestream information and how you can register for the roadshow closest to you.
- Production ML for Data Scientists: What You Can Do and How to Make It Easy, August 22 Webinar - Aug 8, 2018.
Learn about MLOps –machine learning operationalization that breaks down the silos between data science and IT; Streamlines deployment and orchestration, and adds advanced functionality.
- Find A Data Science Job Through Vettery - Aug 8, 2018.
Vettery specializes in tech roles and is completely free for job seekers. Interested? Submit your profile!
- Top July Stories: Cartoon: Data Scientist was the sexiest job of the 21st century until …; Does PCA really improve classification outcome? Causation in a nutshell - Aug 8, 2018.
Also: 5 of Our Favorite Free Visualization Tools; Comparison of Top 6 Python NLP Libraries; Causation in a nutshell.
- Optimization 101 for Data Scientists - Aug 8, 2018.
We show how to use optimization strategies to make the best possible decision.
-
GitHub Python Data Science Spotlight: AutoML, NLP, Visualization, ML Workflows - Aug 8, 2018.
This post includes a wide spectrum of data science projects, all of which are open source and are present on GitHub repositories. - Data Mining Book – Chapter Download - Aug 7, 2018.
Download this very useful book chapter, and learn how to create derived variables, which allow the statistical and Data Science modeling to incorporate human insights.
- Big Data Innovation & Data Visualization Summits, Boston, September 11-12 - Aug 7, 2018.
Cover all things within the realm of Big Data Innovation and Data Visualization as you advance your learning, knowledge and understanding on areas including: Use code KD200 to save.
- How GOAT Taught a Machine to Love Sneakers - Aug 7, 2018.
Embeddings are a fantastic tool to create reusable value with inherent properties similar to how humans interpret objects. GOAT uses deep learning to generate these for their entire sneaker catalogue.
- Seven Practical Ideas For Beginner Data Scientists - Aug 7, 2018.
As someone who has been there, I’d like to outline a few practical ideas to help junior data scientists get started at a small software company. The steps were drawn from my personal journey and that of others before me.
-
Programming Best Practices For Data Science - Aug 7, 2018.
In this post, I'll go over the two mindsets most people switch between when doing programming work specifically for data science: the prototype mindset and the production mindset. - [Webinar] Life as a Data Scientist and How to Become One - Aug 6, 2018.
Springboard is hosting a special webinar to give you an inside look at what it means to be a data scientist. Learn from a practicing data scientist on Wed Aug 8, 12 PM PDT.
- The agenda for Predictive Analytics World London has just been released - Aug 6, 2018.
Companies like The Washington Post, Alibaba.com, ING and many more will be at Predictive Analytics World London, 17-18 Oct. Check out the newly released schedule now!
- Autoregressive Models in TensorFlow - Aug 6, 2018.
This article investigates autoregressive models in TensorFlow, including autoregressive time series and predictions with the actual observations.
- Top Stories, Jul 30 – Aug 5: Eight iconic examples of data visualisation; Descriptive Statistics in Python - Aug 6, 2018.
Also: Eight iconic examples of data visualisation; Selecting the Best Machine Learning Algorithm for Your Regression Problem; Intuitive Ensemble Learning Guide with Gradient Boosting; Eight iconic examples of data visualisation; Data Scientist Interviews Demystified
-
Only Numpy: Implementing GANs and Adam Optimizer using Numpy - Aug 6, 2018.
This post is an implementation of GANs and the Adam optimizer using only Python and Numpy, with minimal focus on the underlying maths involved. - The AI Conference in San Francisco – Exclusive KDnuggets Offer - Aug 3, 2018.
The AI Conference returns to San Francisco, Sept 4–7. Get a sweeping understanding of the rapidly advancing AI landscape. Save an extra 20% on most passes with code KDN20.
-
Eight iconic examples of data visualisation - Aug 3, 2018.
A collection of the most exemplary examples of data visualizations, including Napoleons invasion of Russia and the iconic London Underground map. - K-Means in Real Life: Clustering Workout Sessions - Aug 3, 2018.
By using the within-cluster sum of squares as cost function, data points in the same cluster will be similar to each other, whereas data points in different clusters will have a lower level of similarity.
- Text Wrangling & Pre-processing: A Practitioner’s Guide to NLP - Aug 3, 2018.
I will highlight some of the most important steps which are used heavily in Natural Language Processing (NLP) pipelines and I frequently use them in my NLP projects.
- Future of Mining Americas 2018, 29-30 Oct 2018, Denver - Aug 2, 2018.
This event connects C-suite, Heads and Managers of Mine Operations and Mining Equipment, Technology and Services providers to debate and define the future mining landscape on a strategic level. Special KDnuggets discount.
- Webinar: Realizing the benefits of Automated Machine Learning, is your organization next? - Aug 2, 2018.
In this live webinar (Aug 8, 1PM EST), discover research findings, best practices for AI adoption, use cases on the growth of machine learning, and how automated machine learning technologies make AI more accessible to organizations of all sizes.
- Upcoming Meetings in AI, Analytics, Big Data, Data Science, Deep Learning, Machine Learning: August and Beyond - Aug 2, 2018.
Coming soon: TDWI Anaheim, JupyterCon NYC, VLDB Rio, ODSC India, KDD 2018 London, AI Conference San Francisco, Big Data Innovation Boston, Strata Data NYC, and many more.
-
Data Scientist Interviews Demystified - Aug 2, 2018.
We look at typical questions in a data science interview, examine the rationale for such questions, and hope to demystify the interview process for recent graduates and aspiring data scientists. - WTF is TF-IDF? - Aug 2, 2018.
Relevant words are not necessarily the most frequent words since stopwords like “the”, “of” or “a” tend to occur very often in many documents.
- Top KDnuggets tweets, Jul 25-31: Causation in a Nutshell; Python Regular Expressions Cheat Sheet - Aug 1, 2018.
Also: Comparison of Top 6 Python NLP Libraries; Math for Machine Learning: Open Doors to Data Science and Artificial Intelligence; Building A Data Science Product in 10 Days; Data Scientist was the sexiest job of the 21st century until...; Automated Machine Learning vs Automated Data Science
-
From Data to Viz: how to select the the right chart for your data - Aug 1, 2018.
We offer an interactive, decision tree-style tool, which examines the data you have and proposes a set of potentially appropriate visualizations to represent your dataset. -
Basic Statistics in Python: Descriptive Statistics - Aug 1, 2018.
This article covers defining statistics, descriptive statistics, measures of central tendency, and measures of spread. This article assumes no prior knowledge of statistics, but does require at least a general knowledge of Python. - Selecting the Best Machine Learning Algorithm for Your Regression Problem - Aug 1, 2018.
This post should then serve as a great aid in selecting the best ML algorithm for you regression problem!