Data Science Toolkit API

The Data Science Toolkit includes many data sets and open-source tools, with REST/JSON API and Python and Javascript interfaces. API includes components to help parse places, text, and people.



Programmable Web, Ajay Ohri, February 11th, 2013

The Data Science ToolkitData Science Toolkit is a collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON API with command line, Python and Javascript interfaces.

The Data Science Toolkit is essentially a specialized Linux distribution, with a lot of useful data software pre-installed and is available as a self-contained Vagrant VM or EC2 AMI.

The API includes the following sub components:

  • Text to Places ...
  • IP Address to Coordinates ...
  • Street Address to Coordinates ...
  • Coordinates to Politics ...
  • File to Text: If you pass in an image, this API will run an optical character recognition algorithm to extract any words or sentences it can from the picture. ...
  • Text to Sentences ...
  • HTML to Text ...
  • HTML to Story ...
  • Text to People: Extracts any sequences of words that look like people's names, and tries to guess their gender from any first names found ...
  • Text to Times ...

Read more.