KDnuggets Home » Software » Web Content Mining, Screen Scraping

Web Content Mining, Screen Scraping

commercial | free and open source
  • AMI Enterprise Intelligence searches, collects, stores and analyses data from the web.
  • Automation Anywhere, intelligent automation software to automate business & IT processes, including web data extraction and screen scraping.
  • Bixolabs, an elastic web mining platform built w/Bixo, Cascading & Hadoop for Amazon's cloud (EC2).
  • Crawlera, a smart IP rotator to work around bot countermeasures, allows to crawl more complex sites like Google.
  • Darcy Ripper, a powerful pure Java multi-platform web crawler with great work load and speed capabilities, with an separate easy-to-use GUI for downloading web resources. Free download.
  • Diggernaut, let's you turn website content into datasets - No programming skills required.
  • Ficstar, customized web extraction, automated data management, and business intelligence.
  • FMiner, a visual web scraping software with a diagram designer.
  • Helium Scraper, a powerful Web Page Scraper / Web Data Extractor that can be set up to extract from the web virtually anything you can point your mouse at.
  • Import.io, an easy and visual way to download and import web data. Free version.
  • iWebScraping, Web Scraping, Data Extraction, Data Mining Services. Scrape data from YellowPages, Directory, Amazon, eBay, Business Listing, Google Maps.
  • Metafy Anthracite Web Mining Software, visually construct spiders and scrapers without scripts (requires MacOS X 10.4 or newer).
  • Mozenda, More-Zenful-Data, web content mining.
  • MyDataProvider builds web scraping services for ecommerce & business.
  • PDFonline (BCL) Data Extraction Software, extract data from your documents.
  • ProxyCrawl reduces time spent developing scrapers and crawlers. Crawling API protects web scrapers against site ban, IP leak, browser crash, CAPTCHA, and proxy failure. The first 1000 requests are free.
  • Scrapy Cloud allows Scrapy/Portia users to crawl ~3 billion pages/month and offers a free plan.
  • Screen Scraper, allows users to scrape structured and unstructured data from websites and format it (free download).
  • Simple Scraper: Web scraping made simple — extract data from any website in seconds and download instantly, scrape in the cloud, or create an API.
  • TheWebMiner, for extracting structured data and custom web scraping services in cloud.
  • Visual Web Ripper, a powerful visual tool used for automated web scraping, web harvesting and content extraction from the web.
  • Web Data Extraction Services provides robust, cutting-edge solutions and services for data extraction from websites.
  • WebGet.io, a visual web scraping service, easy to use with free and low cost options, ability to login to secure sites, clicking, looping, change monitoring, image scraping, and more.
  • Webhose.io, easily get instant access to large scale structured data from online Discussions, News, Blogs and more.
  • WebQL, for creating turnkey web extraction applications, such as price collector, patent information aggregator, etc.
  • XML Miner, XML Miner is a system and class library for mining data and text expressed in XML, extracting knowledge and re-using that knowledge in products and applications in the form of fuzzy logic expert system rules.

free and open source

  • Bixo, an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop.
  • DEiXTo, a powerful tool for creating "extraction rules" (wrappers) that describe what pieces of data to scrape from a web page; consists of GUI and a stand-alone extraction rule executor.
  • Frontera, a crawl frontier manager that allows to dispatch crawling to multiple spiders in parallel - announcement.
  • GNU Wget, command line tool for retrieving files using HTTP, HTTPS and FTP.
  • Iepy, open-source Information Extraction: get data from your documents or content. (iepy on github).
  • Octoparse, a tool to easily extract any unstructured web data into structured data, and save to Excel, HTML, Text, or directly into a database.
  • Pattern, a web mining module for Python; bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider), text analysis (rule-based shallow parser, WordNet interface, tf-idf, ...) and data visualization (graph networks).
  • Portia, the Open Source Visual Web Scraper.
  • Python Web Scraping overview and examples
  • ScraperWiki, a collaborative platform for web-scraping and screen-scraping code and views.
  • Scrapy, a fast high-level screen scraping and web crawling framework in Python.
  • Trapit, system for personalizing content based on keywords, URLs and reading habits.
  • Website Downloader, a completely free way to download a copy of any website and get the contents as a zip.
  • WebSundew, a powerful web scraping and web data extraction tool that extracts data from the web pages with high productivity and speed.


Sign Up

By subscribing you accept KDnuggets Privacy Policy