Import.io easy visual download and import web data
Web was designed for documents, not for data, and Import.io wants to remedy this. I spoke to Import.io founder about what they do, and how Import.io lets you download the web data in an easy and visual way.
By Gregory Piatetsky, Jun 9, 2013.
KDnuggets wrote about Import.io in Jan 2013
Import.io addresses the elephant in the technology industry's room - everyone scrapes online data. Using powerful point and click data extraction tooling, a task which took days coding brittle scrapers is now reduced to a few minutes.
I spoke with Andrew Fogg, @andrewfogg,
Founder and Chief Data Officer of
Import.io to learn more about their progress.
There are many tools for web-mining or web-scraping - see KDnuggets directory of Web Content Mining, Screen Scraping.
What is different about Import.io is its very intuitive, visual method for downloading web data, whether it is on a single web page or multiple pages.
Say you want to download data about chairs data from Ikea. You would use import.io tool, which behaves like a web browser, to visit Ikea site and search for chairs.
Then you would highlight on the page several examples of the fields you want to extract, such as image, price, etc. Import.io uses its algorithms to identify what to extract and automatically creates a spreadsheet with the values of those fields.
Import.io can also handle data on multiple pages by recognizing links to next or previous page.
You can then copy your data into your favourite spreadsheet software or use APIs to access it in an application, for example in JSON format.
I also asked Andy about terms of use. Some websites, such as eBay, don't allow mining or scraping of their data. Andy replied that Import.io is transparent about what they do - their user agent identifies itself as import.io and obeys robots.txt, so it is up to the user to use it properly and up to the website to allow it. However, most of the requests they get from website owners is to send them more traffic.
Import.io is currently free and focuses on developing technology and attracting users.
Eventually, they plan to introduce a paid version, but there will always be a free version.