KDnuggets Home » Software » Web Content Mining, Screen Scraping

Web Content Mining, Screen Scraping

          

commercial | free and open source
  • Automation Anywhere, intelligent automation software to automate business & IT processes, including web data extraction and screen scraping.



  • Bixolabs, an elastic web mining platform built w/Bixo, Cascading & Hadoop for Amazon's cloud (EC2).
  • Darcy Ripper, a powerful pure Java multi-platform web crawler with great work load and speed capabilities, with an separate easy-to-use GUI for downloading web resources. Free download.
  • Extractiv, transforms unstructured web content into highly-structured semantic data.
  • Ficstar, customized web extraction, automated data management, and business intelligence.
  • FMiner, a visual web scraping software with a diagram designer.
  • Helium Scraper, a powerful Web Page Scraper / Web Data Extractor that can be set up to extract from the web virtually anything you can point your mouse at.
  • Import.io, an easy and visual way to download and import web data. Free version.
  • iWebScraping, Web Scraping, Data Extraction, Data Mining Services. Scrape data from YellowPages, Directory, Amazon, eBay, Business Listing, Google Maps.
  • Metafy Anthracite Web Mining Software, visually construct spiders and scrapers without scripts (requires MacOS X 10.4 or newer).
  • Mozenda, More-Zenful-Data, web content mining.
  • PDFonline (BCL) Data Extraction Software, extract data from your documents.
  • Screen Scraper, allows users to scrape structured and unstructured data from websites and format it (free download).
  • TheWebMiner, for extracting structured data and custom web scraping services in cloud.
  • Visual Web Ripper, a powerful visual tool used for automated web scraping, web harvesting and content extraction from the web.
  • Web Data Extraction Services provides robust, cutting-edge solutions and services for data extraction from websites.
  • WebQL, for creating turnkey web extraction applications, such as price collector, patent information aggregator, etc.
  • XML Miner, XML Miner is a system and class library for mining data and text expressed in XML, extracting knowledge and re-using that knowledge in products and applications in the form of fuzzy logic expert system rules.


free and open source

  • Bixo, an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop.
  • DEiXTo, a powerful tool for creating "extraction rules" (wrappers) that describe what pieces of data to scrape from a web page; consists of GUI and a stand-alone extraction rule executor.
  • GNU Wget, command line tool for retrieving files using HTTP, HTTPS and FTP.
  • Pattern, a web mining module for Python; bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider), text analysis (rule-based shallow parser, WordNet interface, tf-idf, ...) and data visualization (graph networks).
  • ScraperWiki, a collaborative platform for web-scraping and screen-scraping code and views.
  • Scrapy, a fast high-level screen scraping and web crawling framework in Python.
  • Trapit, system for personalizing content based on keywords, URLs and reading habits.
  • Web Mining Services, provides free, customized web extracts to meet your needs.
  • WebSundew, a powerful web scraping and web data extraction tool that extracts data from the web pages with high productivity and speed.

Related


KDnuggets Home » Software » Web Content Mining, Screen Scraping