Topics: Coronavirus | AI | Data Science | Deep Learning | Machine Learning | Python | R | Statistics

KDnuggets Home » News » 2013 » Jun » Import.io easy visual download and import web data ( 13:n15 )

Import.io easy visual download and import web data


Web was designed for documents, not for data, and Import.io wants to remedy this. I spoke to Import.io founder about what they do, and how Import.io lets you download the web data in an easy and visual way.



By Gregory Piatetsky, Jun 9, 2013.

KDnuggets wrote about Import.io in Jan 2013

Import.io addresses the elephant in the technology industry's room - everyone scrapes online data. Using powerful point and click data extraction tooling, a task which took days coding brittle scrapers is now reduced to a few minutes.

I spoke with Andrew Fogg, @andrewfogg,
Founder and Chief Data Officer of Import.io developer previewImport.io to learn more about their progress.

There are many tools for web-mining or web-scraping - see KDnuggets directory of Web Content Mining, Screen Scraping.

What is different about Import.io is its very intuitive, visual method for downloading web data, whether it is on a single web page or multiple pages.

Say you want to download data about chairs data from Ikea. You would use import.io tool, which behaves like a web browser, to visit Ikea site and search for chairs.

Ikea chairs - web page

Then you would highlight on the page several examples of the fields you want to extract, such as image, price, etc. Import.io uses its algorithms to identify what to extract and automatically creates a spreadsheet with the values of those fields.

Ikea chairs - with import.io

Import.io can also handle data on multiple pages by recognizing links to next or previous page.

You can then copy your data into your favourite spreadsheet software or use APIs to access it in an application, for example in JSON format.

I also asked Andy about terms of use. Some websites, such as eBay, don't allow mining or scraping of their data. Andy replied that Import.io is transparent about what they do - their user agent identifies itself as import.io and obeys robots.txt, so it is up to the user to use it properly and up to the website to allow it. However, most of the requests they get from website owners is to send them more traffic.

Import.io is currently free and focuses on developing technology and attracting users.

Eventually, they plan to introduce a paid version, but there will always be a free version.


Sign Up

By subscribing you accept KDnuggets Privacy Policy