- KDnuggets™ News 21:n18, May 12: Data Preparation in SQL, with Cheat Sheet!; Rebuilding 7 Python Projects - May 12, 2021.
Data Preparation in SQL, with Cheat Sheet!; Rebuilding My 7 Python Projects; Applying Python’s Explode Function to Pandas DataFrames; Essential Linear Algebra for Data Science and Machine Learning; Similarity Metrics in NLP
- Similarity Metrics in NLP - May 10, 2021.
This post covers the use of euclidean distance, dot product, and cosine similarity as NLP similarity metrics.
- What is Neural Search? - May 6, 2021.
And how to get started with it with no prior experience in Machine Learning.
- KDnuggets™ News 21:n17, May 5: Charticulator: Microsoft Research open-source game-changing Data Visualization platform; Data Science to Predict and Prevent Real World Problems - May 5, 2021.
Charticulator: Microsoft Research game-changing Data Visualization platform; How Data Science is used to predict and prevent real world problems; Hilarious Data Science Humor; Neural Networks for Natural Language Processing Now; and more.
- How To Generate Meaningful Sentences Using a T5 Transformer - May 3, 2021.
Read this article to see how to develop a text generation API using the T5 transformer.
- Learn Neural Networks for Natural Language Processing Now - Apr 30, 2021.
Still haven't come across enough quality contemporary natural language processing resources? Here is yet another freely-accessible offering from a top-notch university that might help quench your thirst for learning materials.
- Introducing The NLP Index - Apr 29, 2021.
The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.
- KDnuggets™ News 21:n16, Apr 28: Data Science Books You Should Start Reading in 2021; Top 10 Must-Know Machine Learning Algorithms for Data Scientists - Apr 28, 2021.
Data science is not about data – applying Dijkstra principle to data science; Data Science Books You Should Start Reading in 2021; How to ace A/B Testing Data Science Interviews; Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1; Production-Ready Machine Learning NLP API with FastAPI and spaCy
- Production-Ready Machine Learning NLP API with FastAPI and spaCy - Apr 21, 2021.
Learn how to implement an API based on FastAPI and spaCy for Named Entity Recognition (NER), and see why the author used FastAPI to quickly build a fast and robust machine learning API.
- How to Apply Transformers to Any Length of Text - Apr 12, 2021.
Read on to find how to restore the power of NLP for long sequences.
- Automated Text Classification with EvalML - Apr 6, 2021.
Learn how EvalML leverages Woodwork, Featuretools and the nlp-primitives library to process text data and create a machine learning model that can detect spam text messages.
- 3 More Free Top Notch Natural Language Processing Courses - Mar 31, 2021.
Are you looking to continue your learning of natural language processing? This small collection of 3 free top notch courses will allow you to do just that.
- Multilingual CLIP with Huggingface + PyTorch Lightning - Mar 26, 2021.
An overview of training OpenAI's CLIP on Google Colab.
- Applying Natural Language Processing in Healthcare - Mar 23, 2021.
New advances in natural language processing (NLP) based on deep learning and transfer learning have made a whole set of software use cases in healthcare viable. The Healthcare NLP Summit is a free online conference on April 6th and 7th, bringing together 30+ technical sessions from across the community that works to apply these advances in the real world.
- How to Begin Your NLP Journey - Mar 17, 2021.
In this blog post, learn how to process text using Python.
- Natural Language Processing Pipelines, Explained - Mar 16, 2021.
This article presents a beginner's view of NLP, as well as an explanation of how a typical NLP pipeline might look.
- A Beginner’s Guide to the CLIP Model - Mar 11, 2021.
CLIP is a bridge between computer vision and natural language processing. I'm here to break CLIP down for you in an accessible and fun read! In this post, I'll cover what CLIP is, how CLIP works, and why CLIP is cool.
- Reducing the High Cost of Training NLP Models With SRU++ - Mar 4, 2021.
The increasing computation time and costs of training natural language models (NLP) highlight the importance of inventing computationally efficient models that retain top modeling power with reduced or accelerated computation. A single experiment training a top-performing language model on the 'Billion Word' benchmark would take 384 GPU days and as much as $36,000 using AWS on-demand instances.
- Speech to Text with Wav2Vec 2.0 - Mar 2, 2021.
Facebook recently introduced and open-sourced their new framework for self-supervised learning of representations from raw audio data called Wav2Vec 2.0. Learn more about it and how to use it here.
- Using NLP to improve your Resume - Feb 23, 2021.
This article discusses performing keyword matching and text analysis on job descriptions.
- GPT-2 vs GPT-3: The OpenAI Showdown - Feb 17, 2021.
Thanks to the diversity of the dataset used in the training process, we can obtain adequate text generation for text from a variety of domains. GPT-2 is 10x the parameters and 10x the data of its predecessor GPT.
- Hugging Face Transformers Package – What Is It and How To Use It - Feb 16, 2021.
The rapid development of Transformers have brought a new wave of powerful tools to natural language processing. These models are large and very expensive to train, so pre-trained versions are shared and leveraged by researchers and practitioners. Hugging Face offers a wide variety of pre-trained transformers as open-source libraries, and you can incorporate these with only one line of code.
- 6 NLP Techniques Every Data Scientist Should Know - Feb 12, 2021.
Natural language processing has already begun to transform to way humans interact with computers, and its advances are moving rapidly. The field is built on core methods that must first be understood, with which you can then launch your data science projects to a new level of sophistication and value.
- Getting Started with 5 Essential Natural Language Processing Libraries - Feb 3, 2021.
This article is an overview of how to get started with 5 popular Python NLP libraries, from those for linguistic data visualization, to data preprocessing, to multi-task functionality, to state of the art language modeling, and beyond.
- Vision Transformers: Natural Language Processing (NLP) Increases Efficiency and Model Generality - Feb 2, 2021.
Why do we hear so little about transformer models applied to computer vision tasks? What about attention in computer vision networks?
- Six Times Bigger than GPT-3: Inside Google’s TRILLION Parameter Switch Transformer Model - Jan 25, 2021.
Google’s Switch Transformer model could be the next breakthrough in this area of deep learning.
- OpenAI Releases Two Transformer Models that Magically Link Language and Computer Vision - Jan 11, 2021.
OpenAI has released two new transformer architectures that combine image and language tasks in an fun and almost magical way. Read more about them here.
- 15 Free Data Science, Machine Learning & Statistics eBooks for 2021 - Dec 31, 2020.
We present a curated list of 15 free eBooks compiled in a single location to close out the year.
- How to Clean Text Data at the Command Line - Dec 16, 2020.
A basic tutorial about cleaning data using command-line tools: tr, grep, sort, uniq, sort, awk, sed, and csvlook.
- How to Incorporate Tabular Data with HuggingFace Transformers - Nov 25, 2020.
In real-world scenarios, we often encounter data that includes text and tabular features. Leveraging the latest advances for transformers, effectively handling situations with both data structures can increase performance in your models.
- Top KDnuggets tweets, Nov 11-17: Data Engineering – the Cousin of Data Science, is Troublesome - Nov 18, 2020.
Also 6 Things About #DataScience that Employers Don't Want You to Know; NLP - Zero to Hero with #Python #NLProc; 5 Tricky SQL Queries Solved - Explaining the approach to solving a few complex #SQL queries.
- Top Python Libraries for Deep Learning, Natural Language Processing & Computer Vision - Nov 16, 2020.
This article compiles the 30 top Python libraries for deep learning, natural language processing & computer vision, as best determined by KDnuggets staff.
- How to Acquire the Most Wanted Data Science Skills - Nov 13, 2020.
We recently surveyed KDnuggets readers to determine the "most wanted" data science skills. Since they seem to be those most in demand from practitioners, here is a collection of resources for getting started with this learning.
- Multi-domain summarization by PlexPage - Nov 10, 2020.
The PlexPage by Algoritmi Vision is an Abstractive Multi-domain Search Summarization application built using the unique and innovative structure of the Natural Language Generation (NLG) technique. Learn more here, and try it out for yourself.
- Topic Modeling with BERT - Nov 3, 2020.
Leveraging BERT and TF-IDF to create easily interpretable topics.
- Which flavor of BERT should you use for your QA task? - Oct 22, 2020.
Check out this guide to choosing and benchmarking BERT models for question answering.
- Roadmap to Natural Language Processing (NLP) - Oct 19, 2020.
Check out this introduction to some of the most common techniques and models used in Natural Language Processing (NLP).
- Optimizing the Levenshtein Distance for Measuring Text Similarity - Oct 16, 2020.
For speeding up the calculation of the Levenshtein distance, this tutorial works on calculating using a vector rather than a matrix, which saves a lot of time. We’ll be coding in Java for this implementation.
- Understanding Transformers, the Data Science Way - Oct 1, 2020.
Read this accessible and conversational article about understanding transformers, the data science way — by asking a lot of questions that is.
- An Introduction to NLP and 5 Tips for Raising Your Game - Sep 11, 2020.
This article is a collection of things the author would like to have known when they started out in NLP. Perhaps it will be useful for you.
- Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Semantics and Pragmatics - Aug 31, 2020.
Algorithms for text analytics must model how language works to incorporate meaning in language—and so do the people deploying these algorithms. Bender & Lascarides 2019 is an accessible overview of what the field of linguistics can teach NLP about how meaning is encoded in human languages.
- KDnuggets™ News 20:n33, Aug 26: If I had to start learning Data Science again, how would I do it? Must-read NLP and Deep Learning articles for Data Scientists - Aug 26, 2020.
If I had to start learning Data Science again, how would I do it? Must-read NLP and Deep Learning articles for Data Scientists; These Data Science Skills will be your Superpower; Accelerated Natural Language Processing: A Free Amazon Machine Learning University Course.
- A Deep Dive Into the Transformer Architecture – The Development of Transformer Models - Aug 24, 2020.
Even though transformers for NLP were introduced only a few years ago, they have delivered major impacts to a variety of fields from reinforcement learning to chemistry. Now is the time to better understand the inner workings of transformer architectures to give you the intuition you need to effectively work with these powerful tools.
- The NLP Model Forge: Generate Model Code On Demand - Aug 24, 2020.
You've seen their Big Bad NLP Database and The Super Duper NLP Repo. Now Quantum Stat is back with its most ambitious NLP product yet: The NLP Model Forge.
- Must-read NLP and Deep Learning articles for Data Scientists - Aug 21, 2020.
NLP and deep learning continue to advance, nearly on a daily basis. Check out these recent must-read guides, feature articles, and other resources to keep you on top of the latest advancements and ahead of the curve.
- Accelerated Natural Language Processing: A Free Course From Amazon - Aug 19, 2020.
Amazon's Machine Learning University is making its online courses available to the public, starting with this Accelerated Natural Language Processing offering.
- KDnuggets™ News 20:n32, Aug 19: The List of Top 10 Data Science Lists; Data Science MOOCs with Substance - Aug 19, 2020.
The List of Top 10 Lists in Data Science; Going Beyond Superficial: Data Science MOOCs with Substance; Introduction to Statistics for Data Science; Content-Based Recommendation System using Word Embeddings; How Natural Language Processing Is Changing Data Analytics
- Are Computer Vision Models Vulnerable to Weight Poisoning Attacks? - Aug 17, 2020.
A recent paper has explored the possibility of influencing the predictions of a freshly trained Natural Language Processing (NLP) model by tweaking the weights re-used in its training. his result is especially interesting if it proves to transfer also to the context of Computer Vision (CV) since there, the usage of pre-trained weights is widespread.
- Content-Based Recommendation System using Word Embeddings - Aug 14, 2020.
This article explores how average Word2Vec and TF-IDF Word2Vec can be used to build a recommendation engine.
- How Natural Language Processing Is Changing Data Analytics - Aug 12, 2020.
As it becomes more prevalent, NLP will enable humans to interact with computers in ways not possible before. This new type of collaboration will allow improvements in a wide variety of human endeavors, including business, philanthropy, health, and communication.
- Exploring GPT-3: A New Breakthrough in Language Generation - Aug 10, 2020.
GPT-3 is the largest natural language processing (NLP) transformer released to date, eclipsing the previous record, Microsoft Research’s Turing-NLG at 17B parameters, by about 10 times. This has resulted in an explosion of demos: some good, some bad, all interesting.
- 5 Big Trends in Data Analytics - Jul 30, 2020.
Data analytics is the process by which data is deconstructed and examined for useful patterns and trends. Here we explore five trends making data analytics even more useful.
- 5 Fantastic Natural Language Processing Books - Jul 28, 2020.
This curated collection of 5 natural language processing books attempts to cover a number of different aspects of the field, balancing the practical and the theoretical. Check out these 5 fantastic selections now in order to improve your NLP skills.
- Labelling Data Using Snorkel - Jul 24, 2020.
In this tutorial, we walk through the process of using Snorkel to generate labels for an unlabelled dataset. We will provide you examples of basic Snorkel components by guiding you through a real clinical application of Snorkel.
- Free From Stanford: Ethical and Social Issues in Natural Language Processing - Jul 17, 2020.
Perhaps it's time to take a look at this relatively new offering from Stanford, Ethical and Social Issues in Natural Language Processing (CS384), an advanced seminar course covering ethical and social issues in NLP.
- PyTorch LSTM: Text Generation Tutorial - Jul 13, 2020.
Key element of LSTM is the ability to work with sequences and its gating mechanism.
- Innovating versus Doing: NLP and CORD19 - Jun 30, 2020.
How I learned to trust the process and find value in the road most traveled.
- The Unreasonable Progress of Deep Neural Networks in Natural Language Processing (NLP) - Jun 29, 2020.
Natural language processing has made incredible advances through advanced techniques in deep learning. Learn about these powerful models, and find how close (or far away) these approaches are to human-level understanding.
- Bias in AI: A Primer - Jun 23, 2020.
Those interested in studying AI bias, but who lack a starting point, would do well to check out this introductory set of slides and the accompanying talk on the subject from Google researcher Margaret Mitchell.
- What is emotion AI and why should you care? - Jun 19, 2020.
What is emotion AI, why is it relevant, and what do you need to know about it?
- KDnuggets™ News 20:n24, Jun 17: Easy Speech-to-Text with Python; Data Distributions Overview; Java for Data Scientists - Jun 17, 2020.
Also: Deploy a Machine Learning Pipeline to the Cloud Using a Docker Container; Five Cognitive Biases In Data Science (And how to avoid them); Understanding Machine Learning: The Free eBook; Simplified Mixed Feature Type Preprocessing in Scikit-Learn with Pipelines; A Complete guide to Google Colab for Deep Learning
- Easy Speech-to-Text with Python - Jun 10, 2020.
In this blog, I am demonstrating how to convert speech to text using Python. This can be done with the help of the “Speech Recognition” API and “PyAudio” library.
- KDnuggets™ News 20:n23, Jun 10: Largest Dataset you analyzed? If you start statistics all over again, where would you start? GPT-3 - Jun 10, 2020.
#BlackLivesMatter. In this issue: If you had to start statistics all over again, where would you start? New Poll: What was the largest dataset you analyzed? Another Great NLP Course from Stanford; Naive Bayes: Everything you need to know; GPT-3 - a giant leap for Deep Learning and NLP?
- GPT-3, a giant step for Deep Learning and NLP? - Jun 9, 2020.
Recently, OpenAI announced a new successor to their language model, GPT-3, that is now the largest model trained so far with 175 billion parameters. Training a language model this large has its merits and limitations, so this article covers some of its most interesting and important aspects.
- 5 Essential Papers on Sentiment Analysis - Jun 9, 2020.
To highlight some of the work being done in the field, here are five essential papers on sentiment analysis and sentiment classification.
- Natural Language Processing with Python: The Free eBook - Jun 8, 2020.
This free eBook is an introduction to natural language processing, and to NLTK, one of the most prevalent Python NLP libraries.
- From Languages to Information: Another Great NLP Course from Stanford - Jun 3, 2020.
Check out another example of a Stanford NLP course and its freely available courseware.
- Four Ways to Apply NLP in Financial Services - Jun 2, 2020.
Natural language processing (NLP) is increasingly used to review unstructured content or spot trends in markets. How is Refinitiv Labs applying NLP in financial services to meet challenges around investment decision-making and risk management?
- KDnuggets™ News 20:n21, May 27: The Best NLP with Deep Learning Course is Free; Your First Machine Learning Web App - May 27, 2020.
Also: Python For Everybody: The Free eBook; Complex logic at breakneck speed: Try Julia for data science; An easy guide to choose the right Machine Learning algorithm; Dataset Splitting Best Practices in Python; Appropriately Handling Missing Values for Statistical Modelling and Prediction
- The Best NLP with Deep Learning Course is Free - May 22, 2020.
Stanford's Natural Language Processing with Deep Learning is one of the most respected courses on the topic that you will find anywhere, and the course materials are freely available online.
- Spotting Controversy with NLP - May 21, 2020.
In this article, I’ll introduce you to a hot-topic in financial services and describe how a leading data provider is using data science and NLP to streamline how they find insights in unstructured data.
- Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language - May 19, 2020.
The new neural network extends BERT to interact with tabular datasets.
- Easy Text-to-Speech with Python - May 18, 2020.
Python comes with a lot of handy and easily accessible libraries and we’re going to look at how we can deliver text-to-speech with Python in this article.
- Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot - May 15, 2020.
The new conversational agent exhibit human-like behavior in conversations about almost any topic.
- Text Mining in Python: Steps and Examples - May 12, 2020.
The majority of data exists in the textual form which is a highly unstructured format. In order to produce meaningful insights from the text data then we need to follow a method called Text Analysis.
- Chatbots in a Nutshell - May 7, 2020.
Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about chatbots and the ways they are used.
- KDnuggets™ News 20:n18, May 6: Five Cool Python Libraries for Data Science; NLP Recipes: Best Practices - May 6, 2020.
5 cool Python libraries for Data Science; NLP Recipes: Best Practices and Examples; Deep Learning: The Free eBook; Demystifying the AI Infrastructure Stack; and more.
- Natural Language Processing Recipes: Best Practices and Examples - May 1, 2020.
Here is an overview of another great natural language processing resource, this time from Microsoft, which demonstrates best practices and implementation guidelines for a variety of tasks and scenarios.
- Five Cool Python Libraries for Data Science - Apr 30, 2020.
Check out these 5 cool Python libraries that the author has come across during an NLP project, and which have made their life easier.
- KDnuggets™ News 20:n17, Apr 29: The Super Duper NLP Repo; Free Machine Learning & Data Science Books & Courses for Quarantine - Apr 29, 2020.
Also: Should Data Scientists Model COVID19 and other Biological Events; Learning during a crisis (Data Science 90-day learning challenge); Data Transformation: Standardization vs Normalization; DBSCAN Clustering Algorithm in Machine Learning; Find Your Perfect Fit: A Quick Guide for Job Roles in the Data World
- The Super Duper NLP Repo: 100 Ready-to-Run Colab Notebooks - Apr 24, 2020.
Check out this repository of more than 100 freely-accessible NLP notebooks, curated from around the internet, and ready to launch in Colab with a single click.
- Top KDnuggets tweets, Apr 08-14: Mathematics for #MachineLearning: The Free eBook – KDnuggets - Apr 15, 2020.
Also Exploratory Data Analysis for Natural Language Processing: A Complete Guide to Python Tools; A professor with 20 year experience to all high school seniors (and their parents). If you were planning to enroll in college next fall - don't.
- Simple Question Answering (QA) Systems That Use Text Similarity Detection in Python - Apr 7, 2020.
How exactly are smart algorithms able to engage and communicate with us like humans? The answer lies in Question Answering systems that are built on a foundation of Machine Learning and Natural Language Processing. Let's build one here.
- Why you should NOT use MS MARCO to evaluate semantic search - Apr 2, 2020.
If we want to investigate the power and limitations of semantic vectors (pre-trained or not), we should ideally prioritize datasets that are less biased towards term-matching signals. This piece shows that the MS MARCO dataset is more biased towards those signals than we expected and that the same issues are likely present in many other datasets due to similar data collection designs.
- A Comprehensive Data Repository for Fake Health News Detection - Mar 19, 2020.
We introduce the FakeHealth, a new data repository for fake health news detection. Following a preliminary analysis to demonstrate its features, we consider additional potential directions for better identifying fake news.
- Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia - Mar 16, 2020.
The framework uses a multi-hop QA method to answer complex questions by reasoning through Wikipedia’s datasets.
- How To Build Your Own Feedback Analysis Solution - Mar 12, 2020.
Automating the analysis of customer feedback will sound like a great idea after reading a couple hundred reviews. Building an NLP solution to provide in-depth analysis of what your customers are thinking is a serious undertaking, and this guide helps you scope out the entire project.
- Tokenization and Text Data Preparation with TensorFlow & Keras - Mar 6, 2020.
This article will look at tokenizing and further preparing text data for feeding into a neural network using TensorFlow and Keras preprocessing tools.
- The Big Bad NLP Database: Access Nearly 300 Datasets - Feb 28, 2020.
Check out this database of nearly 300 freely-accessible NLP datasets, curated from around the internet.
- Microsoft Open Sources ZeRO and DeepSpeed: The Technologies Behind the Biggest Language Model in History - Feb 24, 2020.
The two efforts enable the training of deep learning models at massive scale.
- Illustrating the Reformer - Feb 12, 2020.
In this post, we will try to dive into the Reformer model and try to understand it with some visual guides.
- Intent Recognition with BERT using Keras and TensorFlow 2 - Feb 10, 2020.
TL;DR Learn how to fine-tune the BERT model for text classification. Train and evaluate it on a small dataset for detecting seven intents. The results might surprise you!
- Microsoft Open Sources Jericho to Train Reinforcement Learning Using Linguistic Games - Feb 3, 2020.
The new framework provides an OpenAI-like environment for language-based games.
- Top 10 AI, Machine Learning Research Articles to know - Jan 30, 2020.
We’ve seen many predictions for what new advances are expected in the field of AI and machine learning. Here, we review a “data set” based on what researchers were apparently studying at the turn of the decade to take a fresh glimpse into what might come to pass in 2020.
- Generating English Pronoun Questions Using Neural Coreference Resolution - Jan 29, 2020.
This post will introduce a practical method for generating English pronoun questions from any story or article. Learn how to take an additional step toward computationally understanding language.
- A bird’s-eye view of modern AI from NeurIPS 2019 - Jan 28, 2020.
With the explosion of the field of AI/ML impacting so many applications and industries, there is great value coming out of recent progress. This review highlights many research areas covered at the NeurIPS 2019 conference recently held in Vancouver, Canada, and features many important areas of progress we expect to see in the coming year.
- Uber Has Been Quietly Assembling One of the Most Impressive Open Source Deep Learning Stacks in the Market - Jan 27, 2020.
Many of the technologies used by Uber teams have been open sourced and received accolades from the machine learning community. Let’s look at some of my favorites.
- NLP Year in Review — 2019 - Jan 23, 2020.
In this blog post, I want to highlight some of the most important stories related to machine learning and NLP that I came across in 2019.
- The Future of Machine Learning - Jan 17, 2020.
This summary overviews the keynote at TensorFlow World by Jeff Dean, Head of AI at Google, that considered the advancements of computer vision and language models and predicted the direction machine learning model building should follow for the future.
- Top 10 Technology Trends for 2020 - Jan 16, 2020.
With integrations of multiple emerging technologies just in the past year, AI development continues at a fast pace. Following the blueprint of science and technology advancements in 2019, we predict 10 trends we expect to see in 2020 and beyond.
- KDnuggets™ News 20:n02, Jan 15: Top 5 Must-have Data Science Skills; Learn Machine Learning with THIS Book - Jan 15, 2020.
This week: learn the 5 must-have data science skills for the new year; find out which book is THE book to get started learning machine learning; pick up some Python tips and tricks; learn SQL, but learn it the hard way; and find an introductory guide to learning common NLP techniques.
- An Introductory Guide to NLP for Data Scientists with 7 Common Techniques - Jan 9, 2020.
Data Scientists work with tons of data, and many times that data includes natural language text. This guide reviews 7 common techniques with code examples to introduce you the essentials of NLP, so you can begin performing analysis and building models from textual data.
- Top 5 must-have Data Science skills for 2020 - Jan 8, 2020.
The standard job description for a Data Scientist has long highlighted skills in R, Python, SQL, and Machine Learning. With the field evolving, these core competencies are no longer enough to stay competitive in the job market.
- Automatic Text Summarization in a Nutshell - Dec 18, 2019.
Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about Automatic Text Summarization and the various ways it is used.
- Let’s Build an Intelligent Chatbot - Dec 17, 2019.
Check out this step by step approach to building an intelligent chatbot in Python.
- Xavier Amatriain’s Machine Learning and Artificial Intelligence 2019 Year-end Roundup - Dec 16, 2019.
It is an annual tradition for Xavier Amatriain to write a year-end retrospective of advances in AI/ML, and this year is no different. Gain an understanding of the important developments of the past year, as well as insights into what expect in 2020.
- What just happened in the world of AI? - Dec 12, 2019.
The speed at which AI made advancements and news during 2019 makes it imperative now to step back and place these events into order and perspective. It's important to separate the interest that any one advancement initially attracts, from its actual gravity and its consequential influence on the field. This review unfolds the parallel threads of these AI stories over this year and isolates their significance.
- Deploying a pretrained GPT-2 model on AWS - Dec 12, 2019.
This post attempts to summarize my recent detour into NLP, describing how I exposed a Huggingface pre-trained Language Model (LM) on an AWS-based web application.
- The 4 Hottest Trends in Data Science for 2020 - Dec 9, 2019.
The field of Data Science is growing with new capabilities and reach into every industry. With digital transformations occurring in organizations around the world, 2019 included trends of more companies leveraging more data to make better decisions. Check out these next trends in Data Science expected to take off in 2020.
- Webinar: Natural Language Processing for Digital Transformation of Unstructured Text - Dec 6, 2019.
Learn how pharma and healthcare organizations are using the power of Natural Language Processing (NLP) to transform unstructured text into actionable structured data.
- 10 Free Top Notch Machine Learning Courses - Dec 6, 2019.
Are you interested in studying machine learning over the holidays? This collection of 10 free top notch courses will allow you to do just that, with something for every approach to improving your machine learning skills.
- KDnuggets™ News 19:n46, Dec 4: The Future of Data Science Careers; Which Data Visualization Should I Use? - Dec 4, 2019.
This week: The Future of Careers in Data Science & Analysis; Task-based effectiveness of basic visualizations; Open Source Projects by Google, Uber and Facebook for Data Science and AI; Getting Started with Automated Text Summarization; A Non-Technical Reading List for Data Science; and much more!
- Markov Chains: How to Train Text Generation to Write Like George R. R. Martin - Nov 29, 2019.
Read this article on training Markov chains to generate George R. R. Martin style text.
- Lit BERT: NLP Transfer Learning In 3 Steps - Nov 29, 2019.
PyTorch Lightning is a lightweight framework which allows anyone using PyTorch to scale deep learning code easily while making it reproducible. In this tutorial we’ll use Huggingface's implementation of BERT to do a finetuning task in Lightning.
- Getting Started with Automated Text Summarization - Nov 28, 2019.
This article will walk through an extractive text summarization process, using a simple word frequency approach, implemented in Python.
- Spark NLP 101: LightPipeline - Nov 27, 2019.
A Pipeline is specified as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through each stage. Now let’s see how this can be done in Spark NLP using Annotators and Transformers.
- KDnuggets™ News 19:n45, Nov 27: Interpretable vs black box models; Advice for New and Junior Data Scientists - Nov 27, 2019.
This week: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead; Advice for New and Junior Data Scientists; Python Tuples and Tuple Methods; Can Neural Networks Develop Attention? Google Thinks they Can; Three Methods of Data Pre-Processing for Text Classification
- Content-based Recommender Using Natural Language Processing (NLP) - Nov 26, 2019.
A guide to build a content-based movie recommender model based on NLP.
- Text Encoding: A Review - Nov 22, 2019.
We will focus here exactly on that part of the analysis that transforms words into numbers and texts into number vectors: text encoding.
- Topics Extraction and Classification of Online Chats - Nov 14, 2019.
This article provides covers how to automatically identify the topics within a corpus of textual data by using unsupervised topic modelling, and then apply a supervised classification algorithm to assign topic labels to each textual document by using the result of the previous step as target labels.
- KDnuggets™ News 19:n43, Nov 13: Dynamic Reports in Python and R; Creating NLP Vocabularies; What is Data Science? - Nov 13, 2019.
On KDnuggets this week: Orchestrating Dynamic Reports in Python and R with Rmd Files; How to Create a Vocabulary for NLP Tasks in Python; What is Data Science?; The Complete Data Science LinkedIn Profile Guide; Set Operations Applied to Pandas DataFrames; and much, much more.
- Understanding NLP and Topic Modeling Part 1 - Nov 12, 2019.
In this post, we seek to understand why topic modeling is important and how it helps us as data scientists.
- How to Create a Vocabulary for NLP Tasks in Python - Nov 7, 2019.
This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.
- Research Guide for Transformers - Oct 30, 2019.
The problem with RNNs and CNNs is that they aren’t able to keep up with context and content when sentences are too long. This limitation has been solved by paying attention to the word that is currently being operated on. This guide will focus on how this problem can be addressed by Transformers with the help of deep learning.
- KDnuggets™ News 19:n41, Oct 30: Feature Selection: Beyond feature importance?; Time Series Analysis Using KNIME and Spark - Oct 30, 2019.
This week in KDnuggets: Feature Selection: Beyond feature importance?; Time Series Analysis: A Simple Example with KNIME and Spark; 5 Advanced Features of Pandas and How to Use Them; How to Measure Foot Traffic Using Data Analytics; Introduction to Natural Language Processing (NLP); and much, much more!
- Harnessing Semiotics and Discourse Communities to Understand User Intent - Oct 25, 2019.
Semiotics helps us understand the importance of context to determining the meaning of a term and discourse communities provide us with the background context (mental model) by which to correctly interpret its meaning correctly.
- Introduction to Natural Language Processing (NLP) - Oct 25, 2019.
Have you ever wondered how your personal assistant (e.g: Siri) is built? Do you want to build your own? Perfect! Let’s talk about Natural Language Processing.
- KDnuggets™ News 19:n39, Oct 16: Key Ideas in Document Embedding; The problem with metrics is a big problem for AI - Oct 16, 2019.
This week on KDnuggets: Beyond Word Embedding: Key Ideas in Document Embedding; The problem with metrics is a big problem for AI; Activation maps for deep learning models in a few lines of code; There is No Such Thing as a Free Lunch; 8 Paths to Getting a Machine Learning Job Interview; and much, much more.
- Beyond Word Embedding: Key Ideas in Document Embedding - Oct 11, 2019.
This literature review on document embedding techniques thoroughly covers the many ways practitioners develop rich vector representations of text -- from single sentences to entire books.
- Lemma, Lemma, Red Pyjama: Or, doing words with AI - Oct 10, 2019.
If we want a machine learning model to be able to generalize these forms together, we need to map them to a shared representation. But when are two different words the same for our purposes? It depends.