- Scaling human oversight of AI systems for difficult tasks – OpenAI approach - Oct 11, 2021.
The foundational idea of Artificial Intelligence is that it should demonstrate human-level intelligence. So, unless a model can perform as a human might do, its intended purpose is missed. Here, recent OpenAI research into full-length book summarization focuses on generating results that make sense to humans with state-of-the-art results that leverage scalable AI-enhanced human-in-the-loop feedback.
- The Evolution of Tokenization – Byte Pair Encoding in NLP - Oct 7, 2021.
Though we have SOTA algorithms for tokenization, it's always a good practice to understand the evolution trail and learning how have we reached here. Read this introduction to Byte Pair Encoding.
- Building a Structured Financial Newsfeed Using Python, SpaCy and Streamlit - Sep 28, 2021.
Getting started with NLP by building a Named Entity Recognition(NER) application.
- GitHub Copilot and the Rise of AI Language Models in Programming Automation - Sep 22, 2021.
Read on to learn more about what makes Copilot different from previous autocomplete tools (including TabNine), and why this particular tool has been generating so much controversy.
- 15 Must-Know Python String Methods - Sep 21, 2021.
It is not always about numbers.
- Text Preprocessing Methods for Deep Learning - Sep 10, 2021.
While the preprocessing pipeline we are focusing on in this post is mainly centered around Deep Learning, most of it will also be applicable to conventional machine learning models too.
- Five Key Facts About Wu Dao 2.0: The Largest Transformer Model Ever Built - Sep 6, 2021.
The record-setting model combines some clever research and engineering methods.
- Behind OpenAI Codex: 5 Fascinating Challenges About Building Codex You Didn’t Know About - Sep 3, 2021.
Some ML engineering and modeling challenges encountering during the construction of Codex.
- Best Resources to Learn Natural Language Processing in 2021 - Sep 2, 2021.
In this article, the author has listed listed all the best resources to learn natural language processing including Online Courses, Tutorials, Books, and YouTube Videos.
- NLP Insights for the Penguin Café Orchestra - Aug 31, 2021.
We give an example of how to use Expert.ai and Python to investigate favorite music albums.
- Multilabel Document Categorization, step by step example - Aug 31, 2021.
This detailed guide explores an unsupervised and supervised learning two-stage approach with LDA and BERT to develop a domain-specific document categorizer on unlabeled documents.
- Introducing Packed BERT for 2x Training Speed-up in Natural Language Processing - Aug 30, 2021.
Check out this new BERT packing algorithm for more efficient training.
- 3 Data Acquisition, Annotation, and Augmentation Tools - Aug 27, 2021.
Check out these 3 projects found around GitHub that can help with your data acquisition, annotation, and augmentation tasks.
- Jurassic-1 Language Models and AI21 Studio - Aug 23, 2021.
AI21 Labs’ new developer platform offers instant access to our 178B-parameter language model, to help you build sophisticated text-based AI applications at scale.
- Linear Algebra for Natural Language Processing - Aug 17, 2021.
Learn about representing word semantics in vector space.
- How to Train a BERT Model From Scratch - Aug 13, 2021.
Meet BERT’s Italian cousin, FiliBERTo.
- KDnuggets™ News 21:n29, Aug 4: GitHub Copilot Open Source Alternatives; 3 Reasons Why You Should Use Linear Regression Models Instead of Neural Networks - Aug 4, 2021.
GitHub Copilot Open Source Alternatives; 3 Reasons Why You Should Use Linear Regression Models Instead of Neural Networks; A Brief Introduction to the Concept of Data; MLOps Best Practices; GPU-Powered Data Science (NOT Deep Learning) with RAPIDS
- GitHub Copilot Open Source Alternatives - Jul 29, 2021.
GitHub's Copilot code generation tool is currently only available via approved request. Here are 4 Copilot alternatives that you can use in your programming today.
- An AI-Based Framework Solution to Address Email Management Challenges - Jul 28, 2021.
Expert.ai’s Edge NL API is an on-premise API that can perform NLU tasks with no required training or extra work, offering advanced, out-of-the-box capabilities that address common use cases and can be easily customized to your specific needs.
- KDnuggets™ News 21:n28, Jul 28: Design patterns in machine learning; The Best NLP Course is Free - Jul 28, 2021.
What are the Design patterns for Machine Learning and why you should know them? For more advanced readers, how to use Kafka Connect to create an open source data pipeline for processing real-time data; The state-of-the-art NLP course is freely available; Python Data Structures Compared; Update your Machine Learning skills this summer.
- Facebook Open Sources a Chatbot That Can Discuss Any Topic - Jul 27, 2021.
The new version expands the capabilities of its predecessor building a much more natural conversational experience.
- The Best SOTA NLP Course is Free! - Jul 21, 2021.
Hugging Face has recently released a course on using its libraries and ecosystem for practical NLP, and it appears to be very comprehensive. Have a look for yourself.
- Understanding BERT with Hugging Face - Jul 20, 2021.
We don’t really understand something before we implement it ourselves. So in this post, we will implement a Question Answering Neural Network using BERT and a Hugging Face Library.
- SQL, Syllogisms, and Explanations - Jul 14, 2021.
Check out the Executable English Platform, for self-explaining applications written in English that you can run in your browser.
- GitHub Copilot: Your AI pair programmer – what is all the fuss about? - Jul 5, 2021.
GitHub just released Copilot, a code completion tool on steroids dubbed your "AI pair programmer." Read more about it, and see what all the fuss is about.
- Semantic Search: Measuring Meaning From Jaccard to Bert - Jul 2, 2021.
In this article, we’ll cover a few of the most interesting — and powerful — of these techniques — focusing specifically on semantic search. We’ll learn how they work, what they’re good at, and how we can implement them ourselves.
- KDnuggets™ News 21:n24, Jun 30: What will the demand for Data Scientists be in 10 years?; Add A New Dimension To Your Photos Using Python - Jun 30, 2021.
What will the demand for Data Scientists be in 10 years? Will Data Scientists be extinct?; Add A New Dimension To Your Photos Using Python; Data Scientists are from Mars and Software Developers are from Venus; How to Train a Joint Entities and Relation Extraction Classifier using BERT Transformer with spaCy 3; In-Warehouse Machine Learning and the Modern Data Science Stack
- How to Train a Joint Entities and Relation Extraction Classifier using BERT Transformer with spaCy 3 - Jun 28, 2021.
A step-by-step guide on how to train a relation extraction classifier using Transformer and spaCy3.
- Applied Language Technology: A No-Nonsense Approach - Jun 25, 2021.
Here is a free entry-level applied natural language processing course that can fit into any beginner's roadmap to understanding NLP. Check it out.
- Fine-Tuning Transformer Model for Invoice Recognition - Jun 23, 2021.
The author presents a step-by-step guide from annotation to training.
- KDnuggets™ News 21:n23, Jun 23: Pandas vs SQL: When Data Scientists Should Use Each Tool; How to Land a Data Analytics Job in 6 Months - Jun 23, 2021.
Pandas vs SQL: When Data Scientists Should Use Each Tool; How to Land a Data Analytics Job in 6 Months; A Graph-based Text Similarity Method with Named Entity Information in NLP; The Best Way to Learn Practical NLP?; An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM)
- The Word “WORD” Has 13 Meanings - Jun 22, 2021.
Thoughts around Knowledge Graphs, the semantic nature of language, and the two main types of word ambiguity.
- Overview of AutoNLP from Hugging Face with Example Project - Jun 21, 2021.
AutoNLP is a beta project from Hugging Face that builds on the company’s work with its Transformer project. With AutoNLP you can get a working model with just a few simple terminal commands.
- The Best Way to Learn Practical NLP? - Jun 16, 2021.
Hugging Face has just released a course on using its libraries and ecosystem for practical NLP, and it appears to be very comprehensive. Have a look for yourself.
- A Graph-based Text Similarity Method with Named Entity Information in NLP - Jun 16, 2021.
In this article, the author summarizes the 2017 paper "A Graph-based Text Similarity Measure That Employs Named Entity Information" as per their understanding. Better understand the concepts by reading along.
- Building a Knowledge Graph for Job Search Using BERT - Jun 14, 2021.
A guide on how to create knowledge graphs using NER and Relation Extraction.
- The Essential Guide to Transformers, the Key to Modern SOTA AI - Jun 10, 2021.
You likely know Transformers from their recent spate of success stories in natural language processing, computer vision, and other areas of artificial intelligence, but are familiar with all of the X-formers? More importantly, do you know the differences, and why you might use one over another?
- How to speed up a Deep Learning Language model by almost 50X at half the cost - Jun 9, 2021.
In this blog post, we show how to accelerate fine-tuning the ALBERT language model while also reducing costs by using Determined’s built-in support for distributed training with AWS spot instances.
- How to Fine-Tune BERT Transformer with spaCy 3 - Jun 7, 2021.
A step-by-step guide on how to create a knowledge graph using NER and Relation Extraction.
- How to Create and Deploy a Simple Sentiment Analysis App via API - Jun 1, 2021.
In this article we will create a simple sentiment analysis app using the HuggingFace Transformers library, and deploy it using FastAPI.
- 4 Tips for Dataset Curation for NLP Projects - May 28, 2021.
You have heard it before, and you will hear it again. It's all about the data. Curating the right data is also so important than just curating any data. When dealing with text data, many hard-earned lessons have been learned by others over the years, and here are four data curation tips that you should be sure to follow during your next NLP project.
- Great New Resource for Natural Language Processing Research and Applications - May 27, 2021.
The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.
- Topic Modeling with Streamlit - May 26, 2021.
What does it take to create and deploy a topic modeling web application quickly? Read this post to see how the author uses Python NLP packages for topic modeling, Streamlit for the web application framework, and Streamlit Sharing for deployment.
- Machine Translation in a Nutshell - May 17, 2021.
Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California for a snapshot of machine translation. Dr. Farzindar also provided the original art for this article.
- KDnuggets™ News 21:n18, May 12: Data Preparation in SQL, with Cheat Sheet!; Rebuilding 7 Python Projects - May 12, 2021.
Data Preparation in SQL, with Cheat Sheet!; Rebuilding My 7 Python Projects; Applying Python’s Explode Function to Pandas DataFrames; Essential Linear Algebra for Data Science and Machine Learning; Similarity Metrics in NLP
- Similarity Metrics in NLP - May 10, 2021.
This post covers the use of euclidean distance, dot product, and cosine similarity as NLP similarity metrics.
- What is Neural Search? - May 6, 2021.
And how to get started with it with no prior experience in Machine Learning.
- KDnuggets™ News 21:n17, May 5: Charticulator: Microsoft Research open-source game-changing Data Visualization platform; Data Science to Predict and Prevent Real World Problems - May 5, 2021.
Charticulator: Microsoft Research game-changing Data Visualization platform; How Data Science is used to predict and prevent real world problems; Hilarious Data Science Humor; Neural Networks for Natural Language Processing Now; and more.
- How To Generate Meaningful Sentences Using a T5 Transformer - May 3, 2021.
Read this article to see how to develop a text generation API using the T5 transformer.
- Learn Neural Networks for Natural Language Processing Now - Apr 30, 2021.
Still haven't come across enough quality contemporary natural language processing resources? Here is yet another freely-accessible offering from a top-notch university that might help quench your thirst for learning materials.
- Introducing The NLP Index - Apr 29, 2021.
The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.
- KDnuggets™ News 21:n16, Apr 28: Data Science Books You Should Start Reading in 2021; Top 10 Must-Know Machine Learning Algorithms for Data Scientists - Apr 28, 2021.
Data science is not about data – applying Dijkstra principle to data science; Data Science Books You Should Start Reading in 2021; How to ace A/B Testing Data Science Interviews; Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1; Production-Ready Machine Learning NLP API with FastAPI and spaCy
- Production-Ready Machine Learning NLP API with FastAPI and spaCy - Apr 21, 2021.
Learn how to implement an API based on FastAPI and spaCy for Named Entity Recognition (NER), and see why the author used FastAPI to quickly build a fast and robust machine learning API.
- How to Apply Transformers to Any Length of Text - Apr 12, 2021.
Read on to find how to restore the power of NLP for long sequences.
- Automated Text Classification with EvalML - Apr 6, 2021.
Learn how EvalML leverages Woodwork, Featuretools and the nlp-primitives library to process text data and create a machine learning model that can detect spam text messages.
- 3 More Free Top Notch Natural Language Processing Courses - Mar 31, 2021.
Are you looking to continue your learning of natural language processing? This small collection of 3 free top notch courses will allow you to do just that.
- Multilingual CLIP with Huggingface + PyTorch Lightning - Mar 26, 2021.
An overview of training OpenAI's CLIP on Google Colab.
- Applying Natural Language Processing in Healthcare - Mar 23, 2021.
New advances in natural language processing (NLP) based on deep learning and transfer learning have made a whole set of software use cases in healthcare viable. The Healthcare NLP Summit is a free online conference on April 6th and 7th, bringing together 30+ technical sessions from across the community that works to apply these advances in the real world.
- How to Begin Your NLP Journey - Mar 17, 2021.
In this blog post, learn how to process text using Python.
- Natural Language Processing Pipelines, Explained - Mar 16, 2021.
This article presents a beginner's view of NLP, as well as an explanation of how a typical NLP pipeline might look.
- A Beginner’s Guide to the CLIP Model - Mar 11, 2021.
CLIP is a bridge between computer vision and natural language processing. I'm here to break CLIP down for you in an accessible and fun read! In this post, I'll cover what CLIP is, how CLIP works, and why CLIP is cool.
- Reducing the High Cost of Training NLP Models With SRU++ - Mar 4, 2021.
The increasing computation time and costs of training natural language models (NLP) highlight the importance of inventing computationally efficient models that retain top modeling power with reduced or accelerated computation. A single experiment training a top-performing language model on the 'Billion Word' benchmark would take 384 GPU days and as much as $36,000 using AWS on-demand instances.
- Speech to Text with Wav2Vec 2.0 - Mar 2, 2021.
Facebook recently introduced and open-sourced their new framework for self-supervised learning of representations from raw audio data called Wav2Vec 2.0. Learn more about it and how to use it here.
- Using NLP to improve your Resume - Feb 23, 2021.
This article discusses performing keyword matching and text analysis on job descriptions.
- GPT-2 vs GPT-3: The OpenAI Showdown - Feb 17, 2021.
Thanks to the diversity of the dataset used in the training process, we can obtain adequate text generation for text from a variety of domains. GPT-2 is 10x the parameters and 10x the data of its predecessor GPT.
- Hugging Face Transformers Package – What Is It and How To Use It - Feb 16, 2021.
The rapid development of Transformers have brought a new wave of powerful tools to natural language processing. These models are large and very expensive to train, so pre-trained versions are shared and leveraged by researchers and practitioners. Hugging Face offers a wide variety of pre-trained transformers as open-source libraries, and you can incorporate these with only one line of code.
- 6 NLP Techniques Every Data Scientist Should Know - Feb 12, 2021.
Natural language processing has already begun to transform to way humans interact with computers, and its advances are moving rapidly. The field is built on core methods that must first be understood, with which you can then launch your data science projects to a new level of sophistication and value.
- Getting Started with 5 Essential Natural Language Processing Libraries - Feb 3, 2021.
This article is an overview of how to get started with 5 popular Python NLP libraries, from those for linguistic data visualization, to data preprocessing, to multi-task functionality, to state of the art language modeling, and beyond.
- Vision Transformers: Natural Language Processing (NLP) Increases Efficiency and Model Generality - Feb 2, 2021.
Why do we hear so little about transformer models applied to computer vision tasks? What about attention in computer vision networks?
- Six Times Bigger than GPT-3: Inside Google’s TRILLION Parameter Switch Transformer Model - Jan 25, 2021.
Google’s Switch Transformer model could be the next breakthrough in this area of deep learning.
- OpenAI Releases Two Transformer Models that Magically Link Language and Computer Vision - Jan 11, 2021.
OpenAI has released two new transformer architectures that combine image and language tasks in an fun and almost magical way. Read more about them here.
- 15 Free Data Science, Machine Learning & Statistics eBooks for 2021 - Dec 31, 2020.
We present a curated list of 15 free eBooks compiled in a single location to close out the year.
- How to Clean Text Data at the Command Line - Dec 16, 2020.
A basic tutorial about cleaning data using command-line tools: tr, grep, sort, uniq, sort, awk, sed, and csvlook.
- How to Incorporate Tabular Data with HuggingFace Transformers - Nov 25, 2020.
In real-world scenarios, we often encounter data that includes text and tabular features. Leveraging the latest advances for transformers, effectively handling situations with both data structures can increase performance in your models.
- Top KDnuggets tweets, Nov 11-17: Data Engineering – the Cousin of Data Science, is Troublesome - Nov 18, 2020.
Also 6 Things About #DataScience that Employers Don't Want You to Know; NLP - Zero to Hero with #Python #NLProc; 5 Tricky SQL Queries Solved - Explaining the approach to solving a few complex #SQL queries.
- Top Python Libraries for Deep Learning, Natural Language Processing & Computer Vision - Nov 16, 2020.
This article compiles the 30 top Python libraries for deep learning, natural language processing & computer vision, as best determined by KDnuggets staff.
- How to Acquire the Most Wanted Data Science Skills - Nov 13, 2020.
We recently surveyed KDnuggets readers to determine the "most wanted" data science skills. Since they seem to be those most in demand from practitioners, here is a collection of resources for getting started with this learning.
- Multi-domain summarization by PlexPage - Nov 10, 2020.
The PlexPage by Algoritmi Vision is an Abstractive Multi-domain Search Summarization application built using the unique and innovative structure of the Natural Language Generation (NLG) technique. Learn more here, and try it out for yourself.
- Topic Modeling with BERT - Nov 3, 2020.
Leveraging BERT and TF-IDF to create easily interpretable topics.
- Which flavor of BERT should you use for your QA task? - Oct 22, 2020.
Check out this guide to choosing and benchmarking BERT models for question answering.
- Roadmap to Natural Language Processing (NLP) - Oct 19, 2020.
Check out this introduction to some of the most common techniques and models used in Natural Language Processing (NLP).
- Optimizing the Levenshtein Distance for Measuring Text Similarity - Oct 16, 2020.
For speeding up the calculation of the Levenshtein distance, this tutorial works on calculating using a vector rather than a matrix, which saves a lot of time. We’ll be coding in Java for this implementation.
- Understanding Transformers, the Data Science Way - Oct 1, 2020.
Read this accessible and conversational article about understanding transformers, the data science way — by asking a lot of questions that is.
- An Introduction to NLP and 5 Tips for Raising Your Game - Sep 11, 2020.
This article is a collection of things the author would like to have known when they started out in NLP. Perhaps it will be useful for you.
- Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Semantics and Pragmatics - Aug 31, 2020.
Algorithms for text analytics must model how language works to incorporate meaning in language—and so do the people deploying these algorithms. Bender & Lascarides 2019 is an accessible overview of what the field of linguistics can teach NLP about how meaning is encoded in human languages.
- KDnuggets™ News 20:n33, Aug 26: If I had to start learning Data Science again, how would I do it? Must-read NLP and Deep Learning articles for Data Scientists - Aug 26, 2020.
If I had to start learning Data Science again, how would I do it? Must-read NLP and Deep Learning articles for Data Scientists; These Data Science Skills will be your Superpower; Accelerated Natural Language Processing: A Free Amazon Machine Learning University Course.
- A Deep Dive Into the Transformer Architecture – The Development of Transformer Models - Aug 24, 2020.
Even though transformers for NLP were introduced only a few years ago, they have delivered major impacts to a variety of fields from reinforcement learning to chemistry. Now is the time to better understand the inner workings of transformer architectures to give you the intuition you need to effectively work with these powerful tools.
- The NLP Model Forge: Generate Model Code On Demand - Aug 24, 2020.
You've seen their Big Bad NLP Database and The Super Duper NLP Repo. Now Quantum Stat is back with its most ambitious NLP product yet: The NLP Model Forge.
- Must-read NLP and Deep Learning articles for Data Scientists - Aug 21, 2020.
NLP and deep learning continue to advance, nearly on a daily basis. Check out these recent must-read guides, feature articles, and other resources to keep you on top of the latest advancements and ahead of the curve.
- Accelerated Natural Language Processing: A Free Course From Amazon - Aug 19, 2020.
Amazon's Machine Learning University is making its online courses available to the public, starting with this Accelerated Natural Language Processing offering.
- KDnuggets™ News 20:n32, Aug 19: The List of Top 10 Data Science Lists; Data Science MOOCs with Substance - Aug 19, 2020.
The List of Top 10 Lists in Data Science; Going Beyond Superficial: Data Science MOOCs with Substance; Introduction to Statistics for Data Science; Content-Based Recommendation System using Word Embeddings; How Natural Language Processing Is Changing Data Analytics
- Are Computer Vision Models Vulnerable to Weight Poisoning Attacks? - Aug 17, 2020.
A recent paper has explored the possibility of influencing the predictions of a freshly trained Natural Language Processing (NLP) model by tweaking the weights re-used in its training. his result is especially interesting if it proves to transfer also to the context of Computer Vision (CV) since there, the usage of pre-trained weights is widespread.
- Content-Based Recommendation System using Word Embeddings - Aug 14, 2020.
This article explores how average Word2Vec and TF-IDF Word2Vec can be used to build a recommendation engine.
- How Natural Language Processing Is Changing Data Analytics - Aug 12, 2020.
As it becomes more prevalent, NLP will enable humans to interact with computers in ways not possible before. This new type of collaboration will allow improvements in a wide variety of human endeavors, including business, philanthropy, health, and communication.
- Exploring GPT-3: A New Breakthrough in Language Generation - Aug 10, 2020.
GPT-3 is the largest natural language processing (NLP) transformer released to date, eclipsing the previous record, Microsoft Research’s Turing-NLG at 17B parameters, by about 10 times. This has resulted in an explosion of demos: some good, some bad, all interesting.
- 5 Big Trends in Data Analytics - Jul 30, 2020.
Data analytics is the process by which data is deconstructed and examined for useful patterns and trends. Here we explore five trends making data analytics even more useful.
- 5 Fantastic Natural Language Processing Books - Jul 28, 2020.
This curated collection of 5 natural language processing books attempts to cover a number of different aspects of the field, balancing the practical and the theoretical. Check out these 5 fantastic selections now in order to improve your NLP skills.
- Labelling Data Using Snorkel - Jul 24, 2020.
In this tutorial, we walk through the process of using Snorkel to generate labels for an unlabelled dataset. We will provide you examples of basic Snorkel components by guiding you through a real clinical application of Snorkel.
- Free From Stanford: Ethical and Social Issues in Natural Language Processing - Jul 17, 2020.
Perhaps it's time to take a look at this relatively new offering from Stanford, Ethical and Social Issues in Natural Language Processing (CS384), an advanced seminar course covering ethical and social issues in NLP.
- PyTorch LSTM: Text Generation Tutorial - Jul 13, 2020.
Key element of LSTM is the ability to work with sequences and its gating mechanism.
- Innovating versus Doing: NLP and CORD19 - Jun 30, 2020.
How I learned to trust the process and find value in the road most traveled.
- The Unreasonable Progress of Deep Neural Networks in Natural Language Processing (NLP) - Jun 29, 2020.
Natural language processing has made incredible advances through advanced techniques in deep learning. Learn about these powerful models, and find how close (or far away) these approaches are to human-level understanding.
- Bias in AI: A Primer - Jun 23, 2020.
Those interested in studying AI bias, but who lack a starting point, would do well to check out this introductory set of slides and the accompanying talk on the subject from Google researcher Margaret Mitchell.
- What is emotion AI and why should you care? - Jun 19, 2020.
What is emotion AI, why is it relevant, and what do you need to know about it?
- KDnuggets™ News 20:n24, Jun 17: Easy Speech-to-Text with Python; Data Distributions Overview; Java for Data Scientists - Jun 17, 2020.
Also: Deploy a Machine Learning Pipeline to the Cloud Using a Docker Container; Five Cognitive Biases In Data Science (And how to avoid them); Understanding Machine Learning: The Free eBook; Simplified Mixed Feature Type Preprocessing in Scikit-Learn with Pipelines; A Complete guide to Google Colab for Deep Learning
- Easy Speech-to-Text with Python - Jun 10, 2020.
In this blog, I am demonstrating how to convert speech to text using Python. This can be done with the help of the “Speech Recognition” API and “PyAudio” library.
- KDnuggets™ News 20:n23, Jun 10: Largest Dataset you analyzed? If you start statistics all over again, where would you start? GPT-3 - Jun 10, 2020.
#BlackLivesMatter. In this issue: If you had to start statistics all over again, where would you start? New Poll: What was the largest dataset you analyzed? Another Great NLP Course from Stanford; Naive Bayes: Everything you need to know; GPT-3 - a giant leap for Deep Learning and NLP?
- GPT-3, a giant step for Deep Learning and NLP? - Jun 9, 2020.
Recently, OpenAI announced a new successor to their language model, GPT-3, that is now the largest model trained so far with 175 billion parameters. Training a language model this large has its merits and limitations, so this article covers some of its most interesting and important aspects.
- 5 Essential Papers on Sentiment Analysis - Jun 9, 2020.
To highlight some of the work being done in the field, here are five essential papers on sentiment analysis and sentiment classification.
- Natural Language Processing with Python: The Free eBook - Jun 8, 2020.
This free eBook is an introduction to natural language processing, and to NLTK, one of the most prevalent Python NLP libraries.
- From Languages to Information: Another Great NLP Course from Stanford - Jun 3, 2020.
Check out another example of a Stanford NLP course and its freely available courseware.
- Four Ways to Apply NLP in Financial Services - Jun 2, 2020.
Natural language processing (NLP) is increasingly used to review unstructured content or spot trends in markets. How is Refinitiv Labs applying NLP in financial services to meet challenges around investment decision-making and risk management?
- KDnuggets™ News 20:n21, May 27: The Best NLP with Deep Learning Course is Free; Your First Machine Learning Web App - May 27, 2020.
Also: Python For Everybody: The Free eBook; Complex logic at breakneck speed: Try Julia for data science; An easy guide to choose the right Machine Learning algorithm; Dataset Splitting Best Practices in Python; Appropriately Handling Missing Values for Statistical Modelling and Prediction
- The Best NLP with Deep Learning Course is Free - May 22, 2020.
Stanford's Natural Language Processing with Deep Learning is one of the most respected courses on the topic that you will find anywhere, and the course materials are freely available online.
- Spotting Controversy with NLP - May 21, 2020.
In this article, I’ll introduce you to a hot-topic in financial services and describe how a leading data provider is using data science and NLP to streamline how they find insights in unstructured data.
- Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language - May 19, 2020.
The new neural network extends BERT to interact with tabular datasets.
- Easy Text-to-Speech with Python - May 18, 2020.
Python comes with a lot of handy and easily accessible libraries and we’re going to look at how we can deliver text-to-speech with Python in this article.
- Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot - May 15, 2020.
The new conversational agent exhibit human-like behavior in conversations about almost any topic.
- Text Mining in Python: Steps and Examples - May 12, 2020.
The majority of data exists in the textual form which is a highly unstructured format. In order to produce meaningful insights from the text data then we need to follow a method called Text Analysis.
- Chatbots in a Nutshell - May 7, 2020.
Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about chatbots and the ways they are used.
- KDnuggets™ News 20:n18, May 6: Five Cool Python Libraries for Data Science; NLP Recipes: Best Practices - May 6, 2020.
5 cool Python libraries for Data Science; NLP Recipes: Best Practices and Examples; Deep Learning: The Free eBook; Demystifying the AI Infrastructure Stack; and more.
- Natural Language Processing Recipes: Best Practices and Examples - May 1, 2020.
Here is an overview of another great natural language processing resource, this time from Microsoft, which demonstrates best practices and implementation guidelines for a variety of tasks and scenarios.
- Five Cool Python Libraries for Data Science - Apr 30, 2020.
Check out these 5 cool Python libraries that the author has come across during an NLP project, and which have made their life easier.
- KDnuggets™ News 20:n17, Apr 29: The Super Duper NLP Repo; Free Machine Learning & Data Science Books & Courses for Quarantine - Apr 29, 2020.
Also: Should Data Scientists Model COVID19 and other Biological Events; Learning during a crisis (Data Science 90-day learning challenge); Data Transformation: Standardization vs Normalization; DBSCAN Clustering Algorithm in Machine Learning; Find Your Perfect Fit: A Quick Guide for Job Roles in the Data World
- The Super Duper NLP Repo: 100 Ready-to-Run Colab Notebooks - Apr 24, 2020.
Check out this repository of more than 100 freely-accessible NLP notebooks, curated from around the internet, and ready to launch in Colab with a single click.
- Top KDnuggets tweets, Apr 08-14: Mathematics for #MachineLearning: The Free eBook – KDnuggets - Apr 15, 2020.
Also Exploratory Data Analysis for Natural Language Processing: A Complete Guide to Python Tools; A professor with 20 year experience to all high school seniors (and their parents). If you were planning to enroll in college next fall - don't.
- Simple Question Answering (QA) Systems That Use Text Similarity Detection in Python - Apr 7, 2020.
How exactly are smart algorithms able to engage and communicate with us like humans? The answer lies in Question Answering systems that are built on a foundation of Machine Learning and Natural Language Processing. Let's build one here.
- Why you should NOT use MS MARCO to evaluate semantic search - Apr 2, 2020.
If we want to investigate the power and limitations of semantic vectors (pre-trained or not), we should ideally prioritize datasets that are less biased towards term-matching signals. This piece shows that the MS MARCO dataset is more biased towards those signals than we expected and that the same issues are likely present in many other datasets due to similar data collection designs.
- A Comprehensive Data Repository for Fake Health News Detection - Mar 19, 2020.
We introduce the FakeHealth, a new data repository for fake health news detection. Following a preliminary analysis to demonstrate its features, we consider additional potential directions for better identifying fake news.
- Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia - Mar 16, 2020.
The framework uses a multi-hop QA method to answer complex questions by reasoning through Wikipedia’s datasets.