KDnuggets Home » News » 2019 » Nov » Opinions » Open Source Projects by Google, Uber and Facebook for Data Science and AI ( 19:n46 )

Gold BlogOpen Source Projects by Google, Uber and Facebook for Data Science and AI


Open source is becoming the standard for sharing and improving technology. Some of the largest organizations in the world namely: Google, Facebook and Uber are open sourcing their own technologies that they use in their workflow to the public.



figure-name

Open source is becoming the standard for sharing and improving technology. Some of the largest organizations in the world namely: Google, Facebook and Uber are open sourcing their own technologies that they use in their workflow to the public. This has allowed the common person to utilize technologies that are used in the biggest companies in the world. Probably the most well-known open source projects are PyTorch and Tensorflow (both coincidentally being the de-facto standard for Deep Learning).

 

Open Source Projects by Facebook

 

  1. PyTorch
    • PyTorch is basically the most famous Deep Learning library in the Data Science community. It has a rich ecosystem that data scientists can use to conduct a variety of tasks. Some of the tools available are BoTorch for Bayesian Optimization, AllenNLP for designing and using deep learning models for Natural Language Processing, fastai to easily build and evaluate neural nets and skorch for a high-level interface that provides full scikit-learn compatibility.
  2. Prophet
    • Prophet is an open source time series forecasting library that has an API to both Python and R . It is built to perform well on time series with high seasonality and able to account for holiday effects. It can handle missing data and outliers in the data. A big problem in Time Series is missing data as the data is supposed to be sequential and a common practice is to impute missing values with the mean or median (Most of the time not the bets option in Time Series).

 

Open Source Projects by Uber

 

  1. CausalML
    • CausalML is uber's open source answer for uplift modelling and causal inference methods using machine learning methods. It allows the user to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data.
  2. Ludwig
  3. Pyro
    • Pyro is maintained by Uber AI Labs and was built on top of PyTorch for Deep Probabilistic Programming. It was built on the principles of Universal, Scalable, Minimal and Flexible. A beta version of NumPyro, a probabilistic programming library for Pyro with a NumPy backend is being built for faster processing.
  4. kepler.gl
    • Kepler.gl is Uber's open source geospatial analysis toolbox for scaling on large data sets. It was built to assist data scientists make an impact with location data using an interactive and data driven approach. It is built on top of Mapbox GL and Deck.gl

 

Open Source Projects by Google

 

  1. Google Cloud Data Lab
    • Google Datalab is a interactive visual exploration tool with an IPython backend which means it has a familiar Jupyter environment so those that use Jupyter on a regular basis should feel at home. Cloud Datalab enables analysis of your data on BigQuery, Cloud Machine Learning Engine, Compute Engine, and Cloud Storage using Python, SQL, and JavaScript (for BigQuery user-defined functions) HOWEVER there is a price if you decide to use cloud resources such as VM's and Cloud Storage.
  2. Tensorflow
    • This one needs no introduction. Tensorflow is tied with PyTorch as the de-facto deep learning framework in the Data Science Community. Tensorflow has sparked many extensions in order to better utilize the library from visualizations to production API's straight from its library of commands.
  3. CausalImpact
    • The CausalImpact library is an R library for estimating the causal effect of a designed intervention on a time series. The library uses Bayesian time series model to estimate the effects of an event occurring when there is no real world evidence of it. This is useful when a randomized experiment is not available or feasible.

Related:


Sign Up

By subscribing you accept KDnuggets Privacy Policy