Open Source Projects by Google, Uber and Facebook for Data Science and AI
Open source is becoming the standard for sharing and improving technology. Some of the largest organizations in the world namely: Google, Facebook and Uber are open sourcing their own technologies that they use in their workflow to the public.
Open source is becoming the standard for sharing and improving technology. Some of the largest organizations in the world namely: Google, Facebook and Uber are open sourcing their own technologies that they use in their workflow to the public. This has allowed the common person to utilize technologies that are used in the biggest companies in the world. Probably the most well-known open source projects are PyTorch and Tensorflow (both coincidentally being the de-facto standard for Deep Learning).
- PyTorch is basically the most famous Deep Learning library in the Data Science community. It has a rich ecosystem that data scientists can use to conduct a variety of tasks. Some of the tools available are BoTorch for Bayesian Optimization, AllenNLP for designing and using deep learning models for Natural Language Processing, fastai to easily build and evaluate neural nets and skorch for a high-level interface that provides full scikit-learn compatibility.
- Prophet is an open source time series forecasting library that has an API to both Python and R . It is built to perform well on time series with high seasonality and able to account for holiday effects. It can handle missing data and outliers in the data. A big problem in Time Series is missing data as the data is supposed to be sequential and a common practice is to impute missing values with the mean or median (Most of the time not the bets option in Time Series).
- CausalML is uber's open source answer for uplift modelling and causal inference methods using machine learning methods. It allows the user to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data.
- Ludwig is probably the most famous open source project from Uber. Ludwig allows the user to train and test deep learning models without having to write a piece of code except for specifying YAML . It is built on top of Tensorflow. A Python API is available for users that have a preference.
- Pyro is maintained by Uber AI Labs and was built on top of PyTorch for Deep Probabilistic Programming. It was built on the principles of Universal, Scalable, Minimal and Flexible. A beta version of NumPyro, a probabilistic programming library for Pyro with a NumPy backend is being built for faster processing.
- Kepler.gl is Uber's open source geospatial analysis toolbox for scaling on large data sets. It was built to assist data scientists make an impact with location data using an interactive and data driven approach. It is built on top of Mapbox GL and Deck.gl
- This one needs no introduction. Tensorflow is tied with PyTorch as the de-facto deep learning framework in the Data Science Community. Tensorflow has sparked many extensions in order to better utilize the library from visualizations to production API's straight from its library of commands.
- The CausalImpact library is an R library for estimating the causal effect of a designed intervention on a time series. The library uses Bayesian time series model to estimate the effects of an event occurring when there is no real world evidence of it. This is useful when a randomized experiment is not available or feasible.
- 2018 Year-in-Review: Machine Learning Open Source Projects & Frameworks
- Facebook Has Been Quietly Open Sourcing Some Amazing Deep Learning Capabilities for PyTorch
- How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions