- An Introductory Guide to NLP for Data Scientists with 7 Common Techniques - Jan 9, 2020.
Data Scientists work with tons of data, and many times that data includes natural language text. This guide reviews 7 common techniques with code examples to introduce you the essentials of NLP, so you can begin performing analysis and building models from textual data.
- Text Encoding: A Review - Nov 22, 2019.
We will focus here exactly on that part of the analysis that transforms words into numbers and texts into number vectors: text encoding.
- KDnuggets™ News 19:n39, Oct 16: Key Ideas in Document Embedding; The problem with metrics is a big problem for AI - Oct 16, 2019.
This week on KDnuggets: Beyond Word Embedding: Key Ideas in Document Embedding; The problem with metrics is a big problem for AI; Activation maps for deep learning models in a few lines of code; There is No Such Thing as a Free Lunch; 8 Paths to Getting a Machine Learning Job Interview; and much, much more.
- Beyond Word Embedding: Key Ideas in Document Embedding - Oct 11, 2019.
This literature review on document embedding techniques thoroughly covers the many ways practitioners develop rich vector representations of text -- from single sentences to entire books.
- Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention - Mar 6, 2019.
In this post, the author shows how BERT can mimic a Bag-of-Words model. The visualization tool from Part 1 is extended to probe deeper into the mind of BERT, to expose the neurons that give BERT its shape-shifting superpowers.
- Deconstructing BERT: Distilling 6 Patterns from 100 Million Parameters - Feb 27, 2019.
Google’s BERT algorithm has emerged as a sort of “one model to rule them all.” BERT builds on two key ideas that have been responsible for many of the recent advances in NLP: (1) the transformer architecture and (2) unsupervised pre-training.
- Word Embeddings in NLP and its Applications - Feb 20, 2019.
Word embeddings such as Word2Vec is a key AI method that bridges the human understanding of language to that of a machine and is essential to solving many NLP problems. Here we discuss applications of Word2Vec to Survey responses, comment analysis, recommendation engines, and more.
- Are BERT Features InterBERTible? - Feb 19, 2019.
This is a short analysis of the interpretability of BERT contextual word representations. Does BERT learn a semantic vector representation like Word2Vec?
- ELMo: Contextual Language Embedding - Jan 31, 2019.
Create a semantic search engine using deep contextualised language representations from ELMo and why context is everything in NLP.
- Building an image search service from scratch - Jan 30, 2019.
By the end of this post, you should be able to build a quick semantic search model from scratch, no matter the size of your dataset.
Pages: 1 2
- Word Embeddings & Self-Supervised Learning, Explained - Jan 16, 2019.
There are many algorithms to learn word embeddings. Here, we consider only one of them: word2vec, and only one version of word2vec called skip-gram, which works well in practice.
- The 6 Most Useful Machine Learning Projects of 2018 - Jan 15, 2019.
Let’s take a look at the top 6 most practically useful ML projects over the past year. These projects have published code and datasets that allow individual developers and smaller teams to learn and immediately create value.
- How to solve 90% of NLP problems: a step-by-step guide - Jan 14, 2019.
Read this insightful, step-by-step article on how to use machine learning to understand and leverage text.
- Data Representation for Natural Language Processing Tasks - Nov 2, 2018.
In NLP we must find a way to represent our data (a series of texts) to our systems (e.g. a text classifier). As Yoav Goldberg asks, "How can we encode such categorical data in a way which is amenable for us by a statistical classifier?" Enter the word vector.
- More Effective Transfer Learning for NLP - Oct 1, 2018.
Until recently, the natural language processing community was lacking its ImageNet equivalent — a standardized dataset and training objective to use for training base models.
- Deep Learning for NLP: An Overview of Recent Trends - Sep 5, 2018.
A new paper discusses some of the recent trends in deep learning based natural language processing (NLP) systems and applications. The focus is on the review and comparison of models and methods that have achieved state-of-the-art (SOTA) results on various NLP tasks and some of the current best practices for applying deep learning in NLP.
Pages: 1 2
- KDnuggets™ News 18:n33, Sep 5: Practical Topic Modeling with Python; Classifying AI Technologies; Data Science Project Inspiration - Sep 5, 2018.
Also: An End-to-End Project on Time Series Analysis and Forecasting with Python; Financial Data Analysis - Data Processing 1: Loan Eligibility Prediction; OLAP queries in SQL: A Refresher; Word Vectors in Natural Language Processing: Global Vectors (GloVe)
- How GOAT Taught a Machine to Love Sneakers - Aug 7, 2018.
Embeddings are a fantastic tool to create reusable value with inherent properties similar to how humans interpret objects. GOAT uses deep learning to generate these for their entire sneaker catalogue.
- Efficient Graph-based Word Sense Induction - Jul 18, 2018.
This paper describes a set of algorithms for Natural Language Processing (NLP) that match or exceed the state of the art on several evaluation tasks, while also being much more computationally efficient.
- Text Classification & Embeddings Visualization Using LSTMs, CNNs, and Pre-trained Word Vectors - Jul 5, 2018.
In this tutorial, I classify Yelp round-10 review datasets. After processing the review comments, I trained three model in three different ways and obtained three word embeddings.
- On the contribution of neural networks and word embeddings in Natural Language Processing - May 31, 2018.
In this post I will try to explain, in a very simplified way, how to apply neural networks and integrate word embeddings in text-based applications, and some of the main implicit benefits of using neural networks and word embeddings in NLP.
- Robust Word2Vec Models with Gensim & Applying Word2Vec Features for Machine Learning Tasks - Apr 17, 2018.
The gensim framework, created by Radim Řehůřek consists of a robust, efficient and scalable implementation of the Word2Vec model.
- Implementing Deep Learning Methods and Feature Engineering for Text Data: The Skip-gram Model - Apr 10, 2018.
Just like we discussed in the CBOW model, we need to model this Skip-gram architecture now as a deep learning classification model such that we take in the target word as our input and try to predict the context words.