KDnuggets : News : 2004 : n08 : item15 < PREVIOUS | NEXT >

Publications

From: Balaji Ravindran
Date: 25 Apr 2004
Subject: Text Mining Application to Marketing

This article presents a brief review of various text-mining techniques and ideas for business applications to marketing professionals who form a sect of text-mining user community.

Balaji Ravindran is currently working with OTX-Research at Hollywood.

Introduction

Market Intelligence professionals in various industries have been implementing various technologies for analyzing numerous products, investing in business, identifying customer's needs, positioning products, and other related activities. Analytic tools using text-mining technology are increasing getting popular for finding nuggets from unstructured databases and communicating strategies to business managers through visualization. This article presents a brief review of various text-mining techniques and ideas for business applications to marketing professionals who form a sect of text-mining user community. I have based my ideas depending on my research at "Management of IT" department at the "University of New Orleans". (1)

Text Mining Application to Marketing

Traditional marketing had a positive impact due to technology over the past few decades. Database technologies transformed storing information such as customers, partners, demographics, and preferences for making marketing decisions. In the 90s, the whole world saw economy boom due to improvements and innovation in various IT-related fields. The amount of web pages ameliorated during dot-com era. Search engines were found to crawl web pages to throw out useful information from the heaps. Marketing professionals used search engines, and databases as a part of competitive analyses. Data mining technology helped extract useful information and find nuggets from various databases. Data warehouses turned out to be successful for numerical information, but failed when it came to textual information. The 21st century has taken us beyond the limited amount of information on the web. This is good in one way that more information would provide greater awareness, and better knowledge. In reality, it turns out to be not that good because too much of information leads to redundancy. The knowledge of marketing information is available on the web by means of industry white papers, academic publications relating to markets, trade journals, market news articles, reviews, and even public opinions when it comes down to customer requirements. Text mining technology could help marketing professionals use this information for finding nuggets.

Although the term "Text Mining" sounds similar to data mining, it is very different and highly sophisticated from the latter. (2) Data mining is used for extraction, analysis and summarization of numerical and structured data, whereas text mining is used to handle large volumes of unstructured text data. The number of commercial text mining is exponentially increasing every year. Having said that text mining is used to handle text, it is also delusive that it is an advanced search engine methodology. Search engines are used to retrieve information from heaps of web pages that already exist. The information retrieval system is based on pre-defined categorical sets from organized directories. For example, say an IT-marketing professional wants to find out the latest improvements in enterprise systems technologies. He could log on to a web search engine and type keywords like XML, Web services, EAI. Etc. that could retrieve documents with relative importance to his requirements from a set of documents that already exist. This would help him to provide knowledge about the existing ideas or trends. What if the IT-marketing person wants to have an idea about the future trends in enterprise systems market? Text mining comes into picture in such cases. Text mining helps define relationships between different keywords by techniques like concept clustering, indexing, association, feature extraction, information visualization, and summarization. This innovative technology describes concepts and patterns across various databases to help marketing professionals identify hidden information that could leverage business opportunities. Also, commercial text mining tools is equipped with visually interactive tools to depict patterns and relationships between keywords that form user-friendly interfaces.

Implementation Procedure

Step 1-Construct a digital library: Informal and formal informative resources could be collected from the Internet, for the selected field (such as Enterprise-IT). Digital library consists of a very large number of documents that are stored in distributed information repositories. These repositories would be indexed through various topics. In the case of database information there will be huge volumes of customer transactional data that are unstructured. Irregularity of table structures in various databases causes this difference. This could be resolved by clustering and association techniques.

Clustering groups similar documents according to dominant features. (3) Weights are allocated to each and every document with respect to the topic defined. This ranks every document in the set or every customer in the transactional data on the whole with relative importance. The different ways to cluster documents are hierarchical clustering, binary clustering and self-organizing maps.

Step 2-Glean Information: The various file formats have to be concatenated. These files can be pre-processed linguistically by lemmatizing words, distinguishing synonyms and polysems. Next, this data file could be pruned and filtered to remove irrelevant data. The correlation coefficient is calculated that reflects the association between pairs of data. The results are stored in association matrix. Text mining tools like "Vantage Point" provides correlation maps represented by spheres of various sizes. (4)

Step 3- Interpret text-mined information through pattern detection: Factor maps and cross-correlation maps could be generated. Each node may represent several descriptors that are combined based on how frequently they occur together. Proximity and links show higher correlation among terms. Hidden information could be identified by various iterations for better association visualization.

References:


(1) "Managing Information System Integration Technologies -- A Study of Text mined Industry White Papers", March 2003, University of New Orleans
(2) "Text Mining, Not Data Mining", March 1999, DCI's IT News Report
(3) "The Need for Text Mining in Business Intelligence", by Dan Sullivan, December 2000, DM Review
(4) www.thevantagepoint.com, Vantage Point by SearchTech


KDnuggets : News : 2004 : n08 : item15 < PREVIOUS | NEXT >

Copyright © 2004 KDnuggets.   Subscribe to KDnuggets News!