- The Chatbot Transformation: From Failure to the Future - Dec 21, 2021.
The all-knowing chatbots we once thought to be the future have been replaced by specialized bots, and the results are outstanding.
- Analyzing Scientific Articles with fine-tuned SciBERT NER Model and Neo4j - Dec 9, 2021.
In this article, we will be analyzing a dataset of scientific abstracts using the Neo4j Graph database and a fine-tuned SciBERT model.
- Meta-Learning for Keyphrase Extraction - Dec 3, 2021.
This article explores Meta-Learning for Key phrase Extraction, which delves into the how and why of KeyPhrase Extraction (KPE) - extracting phrases/groups of words from a document to best capture and represent its content. The article outline what needs to be done to build a keyphrase extractor that performs well not only on in-domain data, but also in a zero-shot scenario where keyphrases need to be extracted from data that have a different distribution (either a different domain or a different type of documents).
- Sentiment Analysis with KNIME - Nov 29, 2021.
Check out this tutorial on how to approach sentiment classification with supervised machine learning algorithms.
- Build a Serverless News Data Pipeline using ML on AWS Cloud - Nov 18, 2021.
This is the guide on how to build a serverless data pipeline on AWS with a Machine Learning model deployed as a Sagemaker endpoint.
- Where NLP is heading - Nov 18, 2021.
Natural language processing research and applications are moving forward rapidly. Several trends have emerged on this progress, and point to a future of more exciting possibilities and interesting opportunities in the field.
- KDnuggets™ News 21:n44, Nov 17: Don’t Waste Time Building Your Data Science Network; 19 Data Science Project Ideas for Beginners - Nov 17, 2021.
Don’t Waste Time Building Your Data Science Network; 19 Data Science Project Ideas for Beginners; How I Redesigned over 100 ETL into ELT Data Pipelines; Anecdotes from 11 Role Models in Machine Learning; The Ultimate Guide To Different Word Embedding Techniques In NLP
- How to fast-track machine translation projects - Nov 16, 2021.
Data is the lifeblood of any successful machine learning model, and machine translation models are no exception. Without relevant and properly labelled data, even the most sophisticated model will be unable to achieve reliable results.
- Dream Come True: Building websites by thinking about them - Nov 11, 2021.
From the mind to the computer, make websites using your imagination!
- OpenAI’s Approach to Solve Math Word Problems - Nov 9, 2021.
OpenAI's latest research aims to solve math word problems. Let's dive a bit deeper into the ideas behind this new research.
- POS Tagging, Explained - Nov 8, 2021.
Learn about the strengths of part-of-speech tagging, and about how a strong POS tagger can contribute to natural language understanding.
- 7 Top Open Source Datasets to Train Natural Language Processing (NLP) & Text Models - Nov 8, 2021.
With a lot of excitement and research around NLP, there are growing opportunities to apply these technologies to real-world scenarios. It's not trivial to become familiar with NLP and these open-source data sets can help you increase your skills.
- NLP for Business in the Time of BERTera: Seven Misplaced Passions - Nov 4, 2021.
This article is a brief summary of our observations on some common client misperceptions with respect to recent developments in NLP, especially the use of large-scale models and datasets.
- Salary Breakdown of the Top Data Science Jobs - Nov 2, 2021.
Machine Learning vs NLP vs Data Engineer vs Data Scientist, and what it means to be in each role.
- Simple Text Scraping, Parsing, and Processing with this Python Library - Oct 29, 2021.
Scraping, parsing, and processing text data from the web can be difficult. But it can also be easy, using Newspaper3k.
- Deploying Serverless spaCy Transformer Model with AWS Lambda - Oct 22, 2021.
A step-by-step guide on how to deploy NER transformer model serverless.
- Training BPE, WordPiece, and Unigram Tokenizers from Scratch using Hugging Face - Oct 21, 2021.
Comparing the tokens generated by SOTA tokenization algorithms using Hugging Face's tokenizers package.
- 11 Most Practical Data Science Skills for 2022 - Oct 19, 2021.
While the field of data science continues to evolve with exciting new progress in analytical approaches and machine learning, there remain a core set of skills that are foundational for all general practitioners and specialists, especially those who want to be employable with full-stack capabilities.
- Scaling human oversight of AI systems for difficult tasks – OpenAI approach - Oct 11, 2021.
The foundational idea of Artificial Intelligence is that it should demonstrate human-level intelligence. So, unless a model can perform as a human might do, its intended purpose is missed. Here, recent OpenAI research into full-length book summarization focuses on generating results that make sense to humans with state-of-the-art results that leverage scalable AI-enhanced human-in-the-loop feedback.
- The Evolution of Tokenization – Byte Pair Encoding in NLP - Oct 7, 2021.
Though we have SOTA algorithms for tokenization, it's always a good practice to understand the evolution trail and learning how have we reached here. Read this introduction to Byte Pair Encoding.
- Building a Structured Financial Newsfeed Using Python, SpaCy and Streamlit - Sep 28, 2021.
Getting started with NLP by building a Named Entity Recognition(NER) application.
- GitHub Copilot and the Rise of AI Language Models in Programming Automation - Sep 22, 2021.
Read on to learn more about what makes Copilot different from previous autocomplete tools (including TabNine), and why this particular tool has been generating so much controversy.
- 15 Must-Know Python String Methods - Sep 21, 2021.
It is not always about numbers.
- Text Preprocessing Methods for Deep Learning - Sep 10, 2021.
While the preprocessing pipeline we are focusing on in this post is mainly centered around Deep Learning, most of it will also be applicable to conventional machine learning models too.
- Five Key Facts About Wu Dao 2.0: The Largest Transformer Model Ever Built - Sep 6, 2021.
The record-setting model combines some clever research and engineering methods.
- Behind OpenAI Codex: 5 Fascinating Challenges About Building Codex You Didn’t Know About - Sep 3, 2021.
Some ML engineering and modeling challenges encountering during the construction of Codex.
- Best Resources to Learn Natural Language Processing in 2021 - Sep 2, 2021.
In this article, the author has listed listed all the best resources to learn natural language processing including Online Courses, Tutorials, Books, and YouTube Videos.
- NLP Insights for the Penguin Café Orchestra - Aug 31, 2021.
We give an example of how to use Expert.ai and Python to investigate favorite music albums.
- Multilabel Document Categorization, step by step example - Aug 31, 2021.
This detailed guide explores an unsupervised and supervised learning two-stage approach with LDA and BERT to develop a domain-specific document categorizer on unlabeled documents.
- Introducing Packed BERT for 2x Training Speed-up in Natural Language Processing - Aug 30, 2021.
Check out this new BERT packing algorithm for more efficient training.
- 3 Data Acquisition, Annotation, and Augmentation Tools - Aug 27, 2021.
Check out these 3 projects found around GitHub that can help with your data acquisition, annotation, and augmentation tasks.
- Jurassic-1 Language Models and AI21 Studio - Aug 23, 2021.
AI21 Labs’ new developer platform offers instant access to our 178B-parameter language model, to help you build sophisticated text-based AI applications at scale.
- Linear Algebra for Natural Language Processing - Aug 17, 2021.
Learn about representing word semantics in vector space.
- How to Train a BERT Model From Scratch - Aug 13, 2021.
Meet BERT’s Italian cousin, FiliBERTo.
- KDnuggets™ News 21:n29, Aug 4: GitHub Copilot Open Source Alternatives; 3 Reasons Why You Should Use Linear Regression Models Instead of Neural Networks - Aug 4, 2021.
GitHub Copilot Open Source Alternatives; 3 Reasons Why You Should Use Linear Regression Models Instead of Neural Networks; A Brief Introduction to the Concept of Data; MLOps Best Practices; GPU-Powered Data Science (NOT Deep Learning) with RAPIDS
- An AI-Based Framework Solution to Address Email Management Challenges - Jul 28, 2021.
Expert.ai’s Edge NL API is an on-premise API that can perform NLU tasks with no required training or extra work, offering advanced, out-of-the-box capabilities that address common use cases and can be easily customized to your specific needs.
- KDnuggets™ News 21:n28, Jul 28: Design patterns in machine learning; The Best NLP Course is Free - Jul 28, 2021.
What are the Design patterns for Machine Learning and why you should know them? For more advanced readers, how to use Kafka Connect to create an open source data pipeline for processing real-time data; The state-of-the-art NLP course is freely available; Python Data Structures Compared; Update your Machine Learning skills this summer.
- Facebook Open Sources a Chatbot That Can Discuss Any Topic - Jul 27, 2021.
The new version expands the capabilities of its predecessor building a much more natural conversational experience.
- The Best SOTA NLP Course is Free! - Jul 21, 2021.
Hugging Face has recently released a course on using its libraries and ecosystem for practical NLP, and it appears to be very comprehensive. Have a look for yourself.
- Understanding BERT with Hugging Face - Jul 20, 2021.
We don’t really understand something before we implement it ourselves. So in this post, we will implement a Question Answering Neural Network using BERT and a Hugging Face Library.
- SQL, Syllogisms, and Explanations - Jul 14, 2021.
Check out the Executable English Platform, for self-explaining applications written in English that you can run in your browser.
- GitHub Copilot: Your AI pair programmer – what is all the fuss about? - Jul 5, 2021.
GitHub just released Copilot, a code completion tool on steroids dubbed your "AI pair programmer." Read more about it, and see what all the fuss is about.
- Semantic Search: Measuring Meaning From Jaccard to Bert - Jul 2, 2021.
In this article, we’ll cover a few of the most interesting — and powerful — of these techniques — focusing specifically on semantic search. We’ll learn how they work, what they’re good at, and how we can implement them ourselves.
- KDnuggets™ News 21:n24, Jun 30: What will the demand for Data Scientists be in 10 years?; Add A New Dimension To Your Photos Using Python - Jun 30, 2021.
What will the demand for Data Scientists be in 10 years? Will Data Scientists be extinct?; Add A New Dimension To Your Photos Using Python; Data Scientists are from Mars and Software Developers are from Venus; How to Train a Joint Entities and Relation Extraction Classifier using BERT Transformer with spaCy 3; In-Warehouse Machine Learning and the Modern Data Science Stack
- How to Train a Joint Entities and Relation Extraction Classifier using BERT Transformer with spaCy 3 - Jun 28, 2021.
A step-by-step guide on how to train a relation extraction classifier using Transformer and spaCy3.
- Applied Language Technology: A No-Nonsense Approach - Jun 25, 2021.
Here is a free entry-level applied natural language processing course that can fit into any beginner's roadmap to understanding NLP. Check it out.
- Fine-Tuning Transformer Model for Invoice Recognition - Jun 23, 2021.
The author presents a step-by-step guide from annotation to training.
- KDnuggets™ News 21:n23, Jun 23: Pandas vs SQL: When Data Scientists Should Use Each Tool; How to Land a Data Analytics Job in 6 Months - Jun 23, 2021.
Pandas vs SQL: When Data Scientists Should Use Each Tool; How to Land a Data Analytics Job in 6 Months; A Graph-based Text Similarity Method with Named Entity Information in NLP; The Best Way to Learn Practical NLP?; An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM)
- The Word “WORD” Has 13 Meanings - Jun 22, 2021.
Thoughts around Knowledge Graphs, the semantic nature of language, and the two main types of word ambiguity.
- Overview of AutoNLP from Hugging Face with Example Project - Jun 21, 2021.
AutoNLP is a beta project from Hugging Face that builds on the company’s work with its Transformer project. With AutoNLP you can get a working model with just a few simple terminal commands.
- The Best Way to Learn Practical NLP? - Jun 16, 2021.
Hugging Face has just released a course on using its libraries and ecosystem for practical NLP, and it appears to be very comprehensive. Have a look for yourself.
- A Graph-based Text Similarity Method with Named Entity Information in NLP - Jun 16, 2021.
In this article, the author summarizes the 2017 paper "A Graph-based Text Similarity Measure That Employs Named Entity Information" as per their understanding. Better understand the concepts by reading along.
- Building a Knowledge Graph for Job Search Using BERT - Jun 14, 2021.
A guide on how to create knowledge graphs using NER and Relation Extraction.
- The Essential Guide to Transformers, the Key to Modern SOTA AI - Jun 10, 2021.
You likely know Transformers from their recent spate of success stories in natural language processing, computer vision, and other areas of artificial intelligence, but are familiar with all of the X-formers? More importantly, do you know the differences, and why you might use one over another?
- How to speed up a Deep Learning Language model by almost 50X at half the cost - Jun 9, 2021.
In this blog post, we show how to accelerate fine-tuning the ALBERT language model while also reducing costs by using Determined’s built-in support for distributed training with AWS spot instances.
- How to Fine-Tune BERT Transformer with spaCy 3 - Jun 7, 2021.
A step-by-step guide on how to create a knowledge graph using NER and Relation Extraction.
- How to Create and Deploy a Simple Sentiment Analysis App via API - Jun 1, 2021.
In this article we will create a simple sentiment analysis app using the HuggingFace Transformers library, and deploy it using FastAPI.
- 4 Tips for Dataset Curation for NLP Projects - May 28, 2021.
You have heard it before, and you will hear it again. It's all about the data. Curating the right data is also so important than just curating any data. When dealing with text data, many hard-earned lessons have been learned by others over the years, and here are four data curation tips that you should be sure to follow during your next NLP project.
- Great New Resource for Natural Language Processing Research and Applications - May 27, 2021.
The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.
- Topic Modeling with Streamlit - May 26, 2021.
What does it take to create and deploy a topic modeling web application quickly? Read this post to see how the author uses Python NLP packages for topic modeling, Streamlit for the web application framework, and Streamlit Sharing for deployment.
- Machine Translation in a Nutshell - May 17, 2021.
Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California for a snapshot of machine translation. Dr. Farzindar also provided the original art for this article.
- KDnuggets™ News 21:n18, May 12: Data Preparation in SQL, with Cheat Sheet!; Rebuilding 7 Python Projects - May 12, 2021.
Data Preparation in SQL, with Cheat Sheet!; Rebuilding My 7 Python Projects; Applying Python’s Explode Function to Pandas DataFrames; Essential Linear Algebra for Data Science and Machine Learning; Similarity Metrics in NLP
- Similarity Metrics in NLP - May 10, 2021.
This post covers the use of euclidean distance, dot product, and cosine similarity as NLP similarity metrics.
- What is Neural Search? - May 6, 2021.
And how to get started with it with no prior experience in Machine Learning.
- KDnuggets™ News 21:n17, May 5: Charticulator: Microsoft Research open-source game-changing Data Visualization platform; Data Science to Predict and Prevent Real World Problems - May 5, 2021.
Charticulator: Microsoft Research game-changing Data Visualization platform; How Data Science is used to predict and prevent real world problems; Hilarious Data Science Humor; Neural Networks for Natural Language Processing Now; and more.
- How To Generate Meaningful Sentences Using a T5 Transformer - May 3, 2021.
Read this article to see how to develop a text generation API using the T5 transformer.
- Learn Neural Networks for Natural Language Processing Now - Apr 30, 2021.
Still haven't come across enough quality contemporary natural language processing resources? Here is yet another freely-accessible offering from a top-notch university that might help quench your thirst for learning materials.
- Introducing The NLP Index - Apr 29, 2021.
The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.
- KDnuggets™ News 21:n16, Apr 28: Data Science Books You Should Start Reading in 2021; Top 10 Must-Know Machine Learning Algorithms for Data Scientists - Apr 28, 2021.
Data science is not about data – applying Dijkstra principle to data science; Data Science Books You Should Start Reading in 2021; How to ace A/B Testing Data Science Interviews; Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1; Production-Ready Machine Learning NLP API with FastAPI and spaCy
- Production-Ready Machine Learning NLP API with FastAPI and spaCy - Apr 21, 2021.
Learn how to implement an API based on FastAPI and spaCy for Named Entity Recognition (NER), and see why the author used FastAPI to quickly build a fast and robust machine learning API.
- How to Apply Transformers to Any Length of Text - Apr 12, 2021.
Read on to find how to restore the power of NLP for long sequences.
- Automated Text Classification with EvalML - Apr 6, 2021.
Learn how EvalML leverages Woodwork, Featuretools and the nlp-primitives library to process text data and create a machine learning model that can detect spam text messages.
- 3 More Free Top Notch Natural Language Processing Courses - Mar 31, 2021.
Are you looking to continue your learning of natural language processing? This small collection of 3 free top notch courses will allow you to do just that.
- Multilingual CLIP with Huggingface + PyTorch Lightning - Mar 26, 2021.
An overview of training OpenAI's CLIP on Google Colab.
- Applying Natural Language Processing in Healthcare - Mar 23, 2021.
New advances in natural language processing (NLP) based on deep learning and transfer learning have made a whole set of software use cases in healthcare viable. The Healthcare NLP Summit is a free online conference on April 6th and 7th, bringing together 30+ technical sessions from across the community that works to apply these advances in the real world.
- How to Begin Your NLP Journey - Mar 17, 2021.
In this blog post, learn how to process text using Python.
- Natural Language Processing Pipelines, Explained - Mar 16, 2021.
This article presents a beginner's view of NLP, as well as an explanation of how a typical NLP pipeline might look.
- A Beginner’s Guide to the CLIP Model - Mar 11, 2021.
CLIP is a bridge between computer vision and natural language processing. I'm here to break CLIP down for you in an accessible and fun read! In this post, I'll cover what CLIP is, how CLIP works, and why CLIP is cool.
- Reducing the High Cost of Training NLP Models With SRU++ - Mar 4, 2021.
The increasing computation time and costs of training natural language models (NLP) highlight the importance of inventing computationally efficient models that retain top modeling power with reduced or accelerated computation. A single experiment training a top-performing language model on the 'Billion Word' benchmark would take 384 GPU days and as much as $36,000 using AWS on-demand instances.
- Speech to Text with Wav2Vec 2.0 - Mar 2, 2021.
Facebook recently introduced and open-sourced their new framework for self-supervised learning of representations from raw audio data called Wav2Vec 2.0. Learn more about it and how to use it here.
- Using NLP to improve your Resume - Feb 23, 2021.
This article discusses performing keyword matching and text analysis on job descriptions.
- GPT-2 vs GPT-3: The OpenAI Showdown - Feb 17, 2021.
Thanks to the diversity of the dataset used in the training process, we can obtain adequate text generation for text from a variety of domains. GPT-2 is 10x the parameters and 10x the data of its predecessor GPT.
- Hugging Face Transformers Package – What Is It and How To Use It - Feb 16, 2021.
The rapid development of Transformers have brought a new wave of powerful tools to natural language processing. These models are large and very expensive to train, so pre-trained versions are shared and leveraged by researchers and practitioners. Hugging Face offers a wide variety of pre-trained transformers as open-source libraries, and you can incorporate these with only one line of code.
- 6 NLP Techniques Every Data Scientist Should Know - Feb 12, 2021.
Natural language processing has already begun to transform to way humans interact with computers, and its advances are moving rapidly. The field is built on core methods that must first be understood, with which you can then launch your data science projects to a new level of sophistication and value.
- Getting Started with 5 Essential Natural Language Processing Libraries - Feb 3, 2021.
This article is an overview of how to get started with 5 popular Python NLP libraries, from those for linguistic data visualization, to data preprocessing, to multi-task functionality, to state of the art language modeling, and beyond.
- Vision Transformers: Natural Language Processing (NLP) Increases Efficiency and Model Generality - Feb 2, 2021.
Why do we hear so little about transformer models applied to computer vision tasks? What about attention in computer vision networks?
- Six Times Bigger than GPT-3: Inside Google’s TRILLION Parameter Switch Transformer Model - Jan 25, 2021.
Google’s Switch Transformer model could be the next breakthrough in this area of deep learning.
- OpenAI Releases Two Transformer Models that Magically Link Language and Computer Vision - Jan 11, 2021.
OpenAI has released two new transformer architectures that combine image and language tasks in an fun and almost magical way. Read more about them here.
- 15 Free Data Science, Machine Learning & Statistics eBooks for 2021 - Dec 31, 2020.
We present a curated list of 15 free eBooks compiled in a single location to close out the year.
- How to Clean Text Data at the Command Line - Dec 16, 2020.
A basic tutorial about cleaning data using command-line tools: tr, grep, sort, uniq, sort, awk, sed, and csvlook.
- How to Incorporate Tabular Data with HuggingFace Transformers - Nov 25, 2020.
In real-world scenarios, we often encounter data that includes text and tabular features. Leveraging the latest advances for transformers, effectively handling situations with both data structures can increase performance in your models.
- Top KDnuggets tweets, Nov 11-17: Data Engineering – the Cousin of Data Science, is Troublesome - Nov 18, 2020.
Also 6 Things About #DataScience that Employers Don't Want You to Know; NLP - Zero to Hero with #Python #NLProc; 5 Tricky SQL Queries Solved - Explaining the approach to solving a few complex #SQL queries.
- Top Python Libraries for Deep Learning, Natural Language Processing & Computer Vision - Nov 16, 2020.
This article compiles the 30 top Python libraries for deep learning, natural language processing & computer vision, as best determined by KDnuggets staff.
- How to Acquire the Most Wanted Data Science Skills - Nov 13, 2020.
We recently surveyed KDnuggets readers to determine the "most wanted" data science skills. Since they seem to be those most in demand from practitioners, here is a collection of resources for getting started with this learning.
- Multi-domain summarization by PlexPage - Nov 10, 2020.
The PlexPage by Algoritmi Vision is an Abstractive Multi-domain Search Summarization application built using the unique and innovative structure of the Natural Language Generation (NLG) technique. Learn more here, and try it out for yourself.
- Topic Modeling with BERT - Nov 3, 2020.
Leveraging BERT and TF-IDF to create easily interpretable topics.
- Which flavor of BERT should you use for your QA task? - Oct 22, 2020.
Check out this guide to choosing and benchmarking BERT models for question answering.
- Roadmap to Natural Language Processing (NLP) - Oct 19, 2020.
Check out this introduction to some of the most common techniques and models used in Natural Language Processing (NLP).
- Optimizing the Levenshtein Distance for Measuring Text Similarity - Oct 16, 2020.
For speeding up the calculation of the Levenshtein distance, this tutorial works on calculating using a vector rather than a matrix, which saves a lot of time. We’ll be coding in Java for this implementation.
- Understanding Transformers, the Data Science Way - Oct 1, 2020.
Read this accessible and conversational article about understanding transformers, the data science way — by asking a lot of questions that is.
- An Introduction to NLP and 5 Tips for Raising Your Game - Sep 11, 2020.
This article is a collection of things the author would like to have known when they started out in NLP. Perhaps it will be useful for you.
- Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Semantics and Pragmatics - Aug 31, 2020.
Algorithms for text analytics must model how language works to incorporate meaning in language—and so do the people deploying these algorithms. Bender & Lascarides 2019 is an accessible overview of what the field of linguistics can teach NLP about how meaning is encoded in human languages.
- KDnuggets™ News 20:n33, Aug 26: If I had to start learning Data Science again, how would I do it? Must-read NLP and Deep Learning articles for Data Scientists - Aug 26, 2020.
If I had to start learning Data Science again, how would I do it? Must-read NLP and Deep Learning articles for Data Scientists; These Data Science Skills will be your Superpower; Accelerated Natural Language Processing: A Free Amazon Machine Learning University Course.
- A Deep Dive Into the Transformer Architecture – The Development of Transformer Models - Aug 24, 2020.
Even though transformers for NLP were introduced only a few years ago, they have delivered major impacts to a variety of fields from reinforcement learning to chemistry. Now is the time to better understand the inner workings of transformer architectures to give you the intuition you need to effectively work with these powerful tools.
- The NLP Model Forge: Generate Model Code On Demand - Aug 24, 2020.
You've seen their Big Bad NLP Database and The Super Duper NLP Repo. Now Quantum Stat is back with its most ambitious NLP product yet: The NLP Model Forge.
- Must-read NLP and Deep Learning articles for Data Scientists - Aug 21, 2020.
NLP and deep learning continue to advance, nearly on a daily basis. Check out these recent must-read guides, feature articles, and other resources to keep you on top of the latest advancements and ahead of the curve.
- Accelerated Natural Language Processing: A Free Course From Amazon - Aug 19, 2020.
Amazon's Machine Learning University is making its online courses available to the public, starting with this Accelerated Natural Language Processing offering.
- KDnuggets™ News 20:n32, Aug 19: The List of Top 10 Data Science Lists; Data Science MOOCs with Substance - Aug 19, 2020.
The List of Top 10 Lists in Data Science; Going Beyond Superficial: Data Science MOOCs with Substance; Introduction to Statistics for Data Science; Content-Based Recommendation System using Word Embeddings; How Natural Language Processing Is Changing Data Analytics
- Are Computer Vision Models Vulnerable to Weight Poisoning Attacks? - Aug 17, 2020.
A recent paper has explored the possibility of influencing the predictions of a freshly trained Natural Language Processing (NLP) model by tweaking the weights re-used in its training. his result is especially interesting if it proves to transfer also to the context of Computer Vision (CV) since there, the usage of pre-trained weights is widespread.
- Content-Based Recommendation System using Word Embeddings - Aug 14, 2020.
This article explores how average Word2Vec and TF-IDF Word2Vec can be used to build a recommendation engine.
- How Natural Language Processing Is Changing Data Analytics - Aug 12, 2020.
As it becomes more prevalent, NLP will enable humans to interact with computers in ways not possible before. This new type of collaboration will allow improvements in a wide variety of human endeavors, including business, philanthropy, health, and communication.
- Exploring GPT-3: A New Breakthrough in Language Generation - Aug 10, 2020.
GPT-3 is the largest natural language processing (NLP) transformer released to date, eclipsing the previous record, Microsoft Research’s Turing-NLG at 17B parameters, by about 10 times. This has resulted in an explosion of demos: some good, some bad, all interesting.
- 5 Big Trends in Data Analytics - Jul 30, 2020.
Data analytics is the process by which data is deconstructed and examined for useful patterns and trends. Here we explore five trends making data analytics even more useful.
- 5 Fantastic Natural Language Processing Books - Jul 28, 2020.
This curated collection of 5 natural language processing books attempts to cover a number of different aspects of the field, balancing the practical and the theoretical. Check out these 5 fantastic selections now in order to improve your NLP skills.
- Labelling Data Using Snorkel - Jul 24, 2020.
In this tutorial, we walk through the process of using Snorkel to generate labels for an unlabelled dataset. We will provide you examples of basic Snorkel components by guiding you through a real clinical application of Snorkel.
- Free From Stanford: Ethical and Social Issues in Natural Language Processing - Jul 17, 2020.
Perhaps it's time to take a look at this relatively new offering from Stanford, Ethical and Social Issues in Natural Language Processing (CS384), an advanced seminar course covering ethical and social issues in NLP.
- PyTorch LSTM: Text Generation Tutorial - Jul 13, 2020.
Key element of LSTM is the ability to work with sequences and its gating mechanism.
- Innovating versus Doing: NLP and CORD19 - Jun 30, 2020.
How I learned to trust the process and find value in the road most traveled.
- The Unreasonable Progress of Deep Neural Networks in Natural Language Processing (NLP) - Jun 29, 2020.
Natural language processing has made incredible advances through advanced techniques in deep learning. Learn about these powerful models, and find how close (or far away) these approaches are to human-level understanding.
- Bias in AI: A Primer - Jun 23, 2020.
Those interested in studying AI bias, but who lack a starting point, would do well to check out this introductory set of slides and the accompanying talk on the subject from Google researcher Margaret Mitchell.
- What is emotion AI and why should you care? - Jun 19, 2020.
What is emotion AI, why is it relevant, and what do you need to know about it?
- KDnuggets™ News 20:n24, Jun 17: Easy Speech-to-Text with Python; Data Distributions Overview; Java for Data Scientists - Jun 17, 2020.
Also: Deploy a Machine Learning Pipeline to the Cloud Using a Docker Container; Five Cognitive Biases In Data Science (And how to avoid them); Understanding Machine Learning: The Free eBook; Simplified Mixed Feature Type Preprocessing in Scikit-Learn with Pipelines; A Complete guide to Google Colab for Deep Learning
- Easy Speech-to-Text with Python - Jun 10, 2020.
In this blog, I am demonstrating how to convert speech to text using Python. This can be done with the help of the “Speech Recognition” API and “PyAudio” library.
- KDnuggets™ News 20:n23, Jun 10: Largest Dataset you analyzed? If you start statistics all over again, where would you start? GPT-3 - Jun 10, 2020.
#BlackLivesMatter. In this issue: If you had to start statistics all over again, where would you start? New Poll: What was the largest dataset you analyzed? Another Great NLP Course from Stanford; Naive Bayes: Everything you need to know; GPT-3 - a giant leap for Deep Learning and NLP?
- GPT-3, a giant step for Deep Learning and NLP? - Jun 9, 2020.
Recently, OpenAI announced a new successor to their language model, GPT-3, that is now the largest model trained so far with 175 billion parameters. Training a language model this large has its merits and limitations, so this article covers some of its most interesting and important aspects.
- 5 Essential Papers on Sentiment Analysis - Jun 9, 2020.
To highlight some of the work being done in the field, here are five essential papers on sentiment analysis and sentiment classification.
- Natural Language Processing with Python: The Free eBook - Jun 8, 2020.
This free eBook is an introduction to natural language processing, and to NLTK, one of the most prevalent Python NLP libraries.
- From Languages to Information: Another Great NLP Course from Stanford - Jun 3, 2020.
Check out another example of a Stanford NLP course and its freely available courseware.
- Four Ways to Apply NLP in Financial Services - Jun 2, 2020.
Natural language processing (NLP) is increasingly used to review unstructured content or spot trends in markets. How is Refinitiv Labs applying NLP in financial services to meet challenges around investment decision-making and risk management?
- KDnuggets™ News 20:n21, May 27: The Best NLP with Deep Learning Course is Free; Your First Machine Learning Web App - May 27, 2020.
Also: Python For Everybody: The Free eBook; Complex logic at breakneck speed: Try Julia for data science; An easy guide to choose the right Machine Learning algorithm; Dataset Splitting Best Practices in Python; Appropriately Handling Missing Values for Statistical Modelling and Prediction