| KDnuggets : News : 2009 : n18 : item33 | |
PublicationsSubject: Book Review: Beautiful Data Peter Jackson's Blog. By This collection, edited by Toby Segaran and Jeff Hammerbacher and published by O’Reilly, contains 20 recent papers on the theme of helping data tell its own story - via an array of data management, data mining, data analysis, data visualization, and data reconciliation techniques. I’ll start by reviewing what I thought were the standouts, and then summarize the rest. ... "Data Finds Data" takes us beyond search to a realm in which relationships between key data items are discovered in real-time and immediately relayed to the right users. "Portable Data in Real Time" addresses some of the issues inherent in sharing data between applications, e.g., the Flickr/Friendfeed example, where user behavior needs to propagate between different social media sites. "Surfacing the Deep Web" describes work at Google to make data behind Web forms amenable to search by automatic query generation. "Natural Language Corpus Data" describes some simple experiments with a trillion-word content set created by Google and now available through the Linguistic Data Consortium. Simple Python programs are used to build a language model of the corpus, i.e., a probability distribution over the words and phrases occurring in the documents. This model can then be applied to common problems like spell correction and spam detection. Read more. |
| KDnuggets : News : 2009 : n18 : item33 | |
Copyright © 2009 KDnuggets. Subscribe to KDnuggets News!