- Training BPE, WordPiece, and Unigram Tokenizers from Scratch using Hugging Face - Oct 21, 2021.
Comparing the tokens generated by SOTA tokenization algorithms using Hugging Face's tokenizers package.
- The Evolution of Tokenization – Byte Pair Encoding in NLP - Oct 7, 2021.
Though we have SOTA algorithms for tokenization, it's always a good practice to understand the evolution trail and learning how have we reached here. Read this introduction to Byte Pair Encoding.
- Tokenization and Text Data Preparation with TensorFlow & Keras - Mar 6, 2020.
This article will look at tokenizing and further preparing text data for feeding into a neural network using TensorFlow and Keras preprocessing tools.
- An Introductory Guide to NLP for Data Scientists with 7 Common Techniques - Jan 9, 2020.
Data Scientists work with tons of data, and many times that data includes natural language text. This guide reviews 7 common techniques with code examples to introduce you the essentials of NLP, so you can begin performing analysis and building models from textual data.
- Your Guide to Natural Language Processing (NLP) - May 23, 2019.
This extensive post covers NLP use cases, basic examples, Tokenization, Stop Words Removal, Stemming, Lemmatization, Topic Modeling, the future of NLP, and more.
- Text Preprocessing in Python: Steps, Tools, and Examples - Nov 6, 2018.
We outline the basic steps of text preprocessing, which are needed for transferring text from human language to machine-readable format for further processing. We will also discuss text preprocessing tools.
Pages: 1 2
- How Machines Understand Our Language: An Introduction to Natural Language Processing - Oct 31, 2018.
The applications of NLP are endless. This is how a machine classifies whether an email is spam or not, if a review is positive or negative, and how a search engine recognizes what type of person you are based on the content of your query to customize the response accordingly.
- A General Approach to Preprocessing Text Data - Dec 1, 2017.
Recently we had a look at a framework for textual data science tasks in their totality. Now we focus on putting together a generalized approach to attacking text data preprocessing, regardless of the specific textual data science task you have in mind.
- Introduction to Natural Language Processing, Part 1: Lexical Units - Feb 16, 2017.
This series explores core concepts of natural language processing, starting with an introduction to the field and explaining how to identify lexical units as a part of data preprocessing.