Must-read NLP and Deep Learning articles for Data Scientists
NLP and deep learning continue to advance, nearly on a daily basis. Check out these recent must-read guides, feature articles, and other resources to keep you on top of the latest advancements and ahead of the curve.
As always, the fields of deep learning and natural language processing are as busy as ever. Despite many industries being hindered by the quarantine restrictions in many countries, the machine learning industry continues to move forward. It seems almost every week, new models are being released, and new startups are showing off AI-powered technologies that will help build a better world. In this article, we will briefly go over some of the biggest recent news in NLP and deep learning, as well as some must-read guides, feature articles, tools, resources, and datasets you may want to check out.
NLP & Deep Learning News
From Nikunj Aggarwal, the Machine Learning Lead at Citizen, this article gives us a great example of how deep learning is being used to create life-changing (or life-saving) technologies. Citizen is an emergency and safety alert app that warns people of incidents and crimes that have taken place in their area in real-time.
Image from Citizen.
The company used a speech-to-text engine and a convolutional neural network to analyze first responder radio frequencies. In doing so, the company was able to scale their app to multiple cities in the United States. This technology could mark a huge change in the police and first responder infrastructure in years to come.
The release of GPT-3 by Open AI was likely the biggest news in the field of NLP this year. However, what many people may have missed is the release of Open AI’s API. The purpose of the API is to give people access to future models developed by the company, including GPT-3. This is big news, as it marks a shift for the company’s normal practices of open-sourcing their models (as they did with GPT-2). In the article, the company explains why they decided to release a commercial product, why they went away from open-source this time around, and how they will control potential misuse of their API.
In a letter to congress, the CEO of IBM publicly stated that the company would be halting development and service offerings of general-purpose facial recognition technology.
This was a huge step for the company and a big message to the data science community as a whole. IBM’s move to prioritize ethics and safety might have encouraged other large tech companies (including Microsoft) to do the same.
With the creation of larger and possibly more complicated deep learning models, it becomes increasingly difficult to explain their intended use cases and other information to users downstream. To help solve this problem, researchers at Google have developed the “Model Card Toolkit” to help make model transparency reports easier to create.
Bonus Machine Learning News
Do you need a Ph.D. to work in data science? Well, Google’s new certification program may change the game. On July 14th, 2020, Google announced their new professional certification programs in the fields of UX design, project management, and data analysis.
Whether or not a Google Certificate in data analysis will be enough to land you a job at a data science team is yet to be determined. However, a certification from the largest tech company in the world may end up being worth more than a 4-year degree.
- The Court of Justice invalidates Decision 2016/1250 on the adequacy of the protection provided by the EU-US Data Protection Shield
In July of 2020, a big decision was made by the Court of Justice of the European Union that may greatly affect data transfer between Europe and the United States. Essentially, the decision was made to invalidate “Decision 2016/1250”, which fostered in a data transfer agreement titled the “EU-U.S. Privacy Shield”.
Instead, those whose data are transferred to a country not in the EU, must be afforded “a level of protection essentially equivalent to that guaranteed within the EU by the GDPR.” This means that if a company like Tik Tok wants to transfer data from users in the EU to be processed on servers in the United States, authorities have the responsibility to prohibit this data transfer if they deem that data privacy and security measures in the United States don’t comply with GDPR standards.
If you want to learn more about this, TechCrunch’s article is a much easier read than the actual legal document.
Deep Learning Guides & Feature Articles
From Sergios Karagiannakos, the founder of AI Summer, this article serves as a meaty guide to deep learning. It introduces many topics, from the different kinds of neural networks to deep learning baselines in NLP and computer vision.
As mentioned previously, Open AI’s launch of GPT-3 was likely the biggest news in NLP so far this year.
For those of you that don’t know, GPT-3 is a text-generating neural network that has 175 billion parameters, which is incredibly larger than the previous model, GPT-2 (1.5 billion parameters). This guide serves as a great overview of the model, with key takeaways and explanations about the model and data used to train it.
From Daily Nous, this is an interesting thought piece where 9 philosophers take a deep dive into Open AI’s GPT-3. These thought leaders explore the possible ethical and moral issues, as well as the lingering questions brought forth by the technology.
From Rahul Agarwal, a data scientist at WalmartLabs, this guide is a step-by-step tutorial on creating a multiclass image classification model. Furthermore, Agarwal explains what transfer learning is and how to use it to improve your own image classification models.
This feature piece from OneZero talks about the long history of autonomous vehicles, from the first manual auto-pilot maneuvers on ships to the self-driving cars we see from the likes of Tesla and Google today.
Additional Tools & Resources
Written by KDnuggets Editor Matthew Mayo, this useful guide introduces five books on NLP from his personal library. Unlike other book lists you may find online, Matthew has personally read all of these books and vouches for their quality. Please note that these books are not free, so they require a bit of investment on your part.
With the rampant spread of misinformation on social media, I was very concerned when I saw this spread reach my own inner circles. As it has become easier and easier to create deepfakes and generate fake articles using AI, I wanted to help combat the malicious use of these technologies. This article introduces a few simple methods and browser plugins that may help you detect both deepfakes and AI-generated text.
This listicle is a simple curation of the largest datasets in the TensorFlow library that may prove useful in improving your deep learning models. It introduces the largest audio, video, image, and text datasets on the platform and some of their intended use cases.
From Toptal, this handy tool can help you determine the average hourly rate for data scientists based on your location, programming languages, and skills. You can use this calculator to compare the average salary for your position in your own country and other countries to help you evaluate your career and plan your next steps.
We hope that these NLP and deep learning articles and guides helped you catch up with some of the big things happening in machine learning this year. For more reading, please take a look at the top stories below.
- 5 Big Trends in Data Analytics
- The Unreasonable Progress of Deep Neural Networks in Natural Language Processing (NLP)
- Trends in Machine Learning in 2020