Bing Liu answers:
Although many Web content mining problems have the same framework
of extraction and integration, the current techniques for dealing with them
are very different. One does not deal with structured data in the same way as unstructured text.
I am not aware of any common programming framework
for Web content mining, or even for each specific task. Our research works were done mainly using C and C++. However, for structured data
extraction, there are tools on the market that either help you extract
data or make it easy for you to write rules to extract data.
For opinion mining, there are natural language processing packages that are helpful, e.g., part-of-speech taggers, parsers etc.
|