Gold BlogTop Python Libraries for Deep Learning, Natural Language Processing & Computer Vision

This article compiles the 30 top Python libraries for deep learning, natural language processing & computer vision, as best determined by KDnuggets staff.

In a previous post, we had a look at the top python libraries for data science, data visualization, and machine learning. This time, we look at the top libraries for deep learning, natural language processing, and computer vision. These categories really don't need any further clarification.

This separation and classification is arbitrary, in some instances more than others, but we have done our best to group tools together by intended use case, hoping this is most useful for readers.

Clearly not all NLP and CV work these days is performed using deep learning techniques, but as the trends move toward such techniques for state of the art results, we stand by this otherwise arbitrary categorization logic.

Our list is made up of libraries that our team decided together by consensus was representative of common and well-used Python libraries. Also, to be included a library must have a Github repository. The categories are in no particular order, and neither are the libraries included within each. We contemplated constructing an ordering arbitrarily by stars or some other metric, but decided against it in order not explicitly stray from placing any perceived value or importance of the libraries within. Their listing here, then, is purely random. Library descriptions are directly from the Github repositories, in some form or another.

Thanks again to Ahmed Anis for contributing to the collection of this data, and to the rest of the KDnuggets staff for their inputs, insights, and suggestions.

Note that the visualization below, by Gregory Piatetsky, represents each library by type, plots it by stars and contributors, and its symbol size is reflective of the number of commits the library has on Github on a logarithmic scale.


Figure 1: Top Python Libraries for Deep Learning, Natural Language Processing & Computer Vision
Plotted by number of stars and number of contributors; relative size by log number of commits


And, so without further ado, here are the 30 top Python libraries for deep learning, natural language processing & computer vision, as best determined by KDnuggets staff.


Deep Learning


1. TensorFlow
Stars: 149000, Commits: 97741, Contributors: 2754

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

2. Keras
Stars: 50000, Commits: 5349, Contributors: 864

Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow.

3. PyTorch
Stars: 43200, Commits: 30696, Contributors: 1619

Tensors and Dynamic neural networks in Python with strong GPU acceleration

4. fastai
Stars: 19800, Commits: 1450, Contributors: 607

fastai simplifies training fast and accurate neural nets using modern best practices

5. PyTorch Lightning
Stars: 9600, Commits: 3594, Contributors: 317

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

6. JAX
Stars: 10000, Commits: 5708, Contributors: 221

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

7. MXNet
Stars: 19100, Commits: 11387, Contributors: 839

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

8. Ignite
Stars: 3100, Commits: 747, Contributors: 112

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.


Natural Language Processing

9. FastText
Stars: 21700, Commits: 379, Contributors: 47

fastText is a library for efficient learning of word representations and sentence classification.

10. spaCy
Stars: 17400, Commits: 11628, Contributors: 482

Industrial-strength Natural Language Processing (NLP) with Python and Cython

11. gensim
Stars: 11200, Commits: 4024, Contributors: 361

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

12. NLTK
Stars: 9300, Commits: 13990, Contributors: 319

NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing.

13. Datasets (Huggingface)
Stars: 4300, Commits: 568, Contributors: 64

Fast, efficient, open-access datasets and evaluation metrics for Natural Language Processing and more in PyTorch, TensorFlow, NumPy and Pandas

14. Tokenizers (Huggingface)
Stars: 3800, Commits: 1252, Contributors: 30

Fast State-of-the-Art Tokenizers optimized for Research and Production

15. Transformers (Huggingface)
Stars: 3500, Commits: 5480, Contributors: 585

Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

16. Stanza
Stars: 4800, Commits: 1514, Contributors: 19

Official Stanford NLP Python Library for Many Human Languages

17. TextBlob
Stars: 7300, Commits: 542, Contributors: 24

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

18. PyTorch-NLP
Stars: 1800, Commits: 442, Contributors: 15

Basic Utilities for PyTorch Natural Language Processing (NLP)

19. Textacy
Stars: 1500, Commits: 1324, Contributors: 23

A Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library.

20. Finetune
Stars: 626, Commits: 1405, Contributors: 13

Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide variety of downstream tasks.

21. TextHero
Stars: 1900, Commits: 266, Contributors: 17

Text preprocessing, representation and visualization from zero to hero.

22. Spark NLP
Stars: 1700, Commits: 4363, Contributors: 50

Spark NLP is a Natural Language Processing library built on top of Apache Spark ML.

23. GluonNLP
Stars: 2200, Commits: 712, Contributors: 72

GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research.


Computer Vision

24. Pillow
Stars: 7800, Commits: 10799, Contributors: 303

Pillow is the friendly PIL fork. PIL is the Python Imaging Library.

25. OpenCV
Stars: 49600, Commits: 29453, Contributors: 1234

Open Source Computer Vision Library

26. scikit-image
Stars: 4000, Commits: 12352, Contributors: 403

Image processing in Python

27. Mahotas
Stars: 644, Commits: 1273, Contributors: 25

Mahotas is a library of fast computer vision algorithms (all implemented in C++ for speed) operating over numpy arrays.

28. Simple-CV
Stars: 2400, Commits: 2625, Contributors: 69

SimpleCV is a framework for Open Source Machine Vision, using OpenCV and the Python programming language.

29. GluonCV
Stars: 4300, Commits: 774, Contributors: 101

GluonCV provides implementations of the state-of-the-art (SOTA) deep learning models in computer vision.

30. Torchvision
Stars: 7500, Commits: 1286, Contributors: 334

The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.