KDnuggets : News : 2006 : n15 : item8 < PREVIOUS | NEXT >

Features


Subject: Interesting Blogs Entries

Here are some interesting recent entries from blogs relevant to data mining

Greg Linden writes about A chance to play with big data (Aug 4, 2006),

A couple fun new data sets are being made available by the search giants.

First, in a humorously titled post, "All Our N-gram are Belong to You", folks at Google Research announced that they "processed 1,011,582,453,213 words of running text and are publishing the counts for all 1,146,580,664 five-word sequences that appear at least 40 times." Very cool.

... Second, the new AOL Research site has posted a list of APIs and data collections from AOL.

Of most interest to me is data set of "500k User Queries Sampled Over 3 Months" that apparently includes {UserID, Query, QueryTime, ClickedRank, DestinationDomainUrl} for each of 20M queries. Drool, drool!


Marcos Campos writes about Finding the Most Typical Record in a Group (July 30, 2006)
I recently came across the following question: How can I find the most typical record in a group or cluster of records? For example, suppose we have a set of customer records, what is the customer that best typifies the group or cluster? The answer to this question can be used for characterizing groups of records of all types. For example, it can be used for characterizing multimedia collections (e.g., text documents or images). ...


Matthew Hurst: Data Mining blog writes about

The Geography of Violence (August 04, 2006)

DataSphere is a pet project which crawls news RSS feeds, geolocates them and presents them with some simple mapping tools. It also has some basic pattern matching which allows for the visualization of articles on the map that match certain terms. The images below show some visualizations in which the distribution of articles about certain locations is displayed. The height of each pin and the size of the head indicates the number of posts. The colour of the pin head indicates the ratio of messages matching a certain set of terms to the total number of posts. The terms that are being matched are:

"bomb", "bombs", "bombing", "missile", "missiles", "explosion", "explosions", "explode", "explodes", "exploding", "rocket", "rockets", "kill", "killing", "killed"

Another blog entry notes that Reuters photographer is doctoring photos from Beirut to add more smoke than actually is.


Rick Sherman: The Data Doghouse is writing about

Is the BI Software Market Maturing?

IDC just published its latest report �Worldwide Business Intelligence Tools 2005 Vendor Shares� on the state of the business intelligence industry . According to IDC�s research, worldwide software revenue grew 11.5% in 2005 to $5.7 billion.

KDnuggets : News : 2006 : n15 : item8 < PREVIOUS | NEXT >

Copyright © 2006 KDnuggets.   Subscribe to KDnuggets News!