David Menninger, Information Management Blogs, September 9, 2011
... Splunk focuses on analyzing large volumes of machine-generated data in underlying applications and systems, which includes application and system logs, network traffic, sensor data, click streams and other loosely structured information sources. Many of these "big data" sources are the same sources analyzed with Hadoop, according to our recently published benchmark research. However, Splunk takes a different approach that focuses on performing simple analyses on this data in real time rather than the batch-based advanced analytics we see as the most common use for Hadoop.
... Splunk focuses on a specific segment of the big-data market: machine-generated data. This type of data originates constantly from many sources throughout an organization and in large quantities. The other common characteristic of machine-generated data is that generally it is less structured than data in typical relational databases. Often the information is captured as logs consisting of text files containing various record lengths and record structures. To effectively utilize this loosely structured information in real time, two challenges must be overcome: loading the data quickly and easily navigating through and analyzing the information once it is loaded.
Splunk tackles the first challenge by loading the information in its raw form. No preprocessing is necessary, therefore no delay is introduced and no data is "lost." Retaining all the raw data has business value as well. If you later decide that you want to investigate some new piece of information that previously you didn't think was important, it will be available for analysis.