a powerful tool for creating "extraction rules" (wrappers) that describe what pieces of data to scrape from a web page; consists of GUI and a stand-alone extraction rule executor.
DEiXTo
is a free, DOM based, web data extraction tool.
It consists of two standalone components:
a) GUI DEiXTo, an MS WindowsT application implementing a
graphical user interface that is used to manage extraction
rules (build, test, fine-tune, save and modify), and
b) DEiXTo Executor, a stand-alone extraction rule executor
(command line utility) that massively and automatically
applies extraction rules on targeted HTML pages and
produces structured output in a variety of formats.
DEiXTo can contend with a wide range of web sites with
high precision and recall, since it provides the user
with an arsenal of features aiming at the construction
of well-engineered extraction rules. Wrappers built with
GUI DEiXTo can be scheduled to run automatically providing
periodic and automated access to resources of interest,
saving users a lot of time, energy and repetitive effort.
DEiXTo is provided free of charge. You can find more
details at deixto.com
|