- Measuring the scalability of SQL and NoSQL systems - May 31, 2011.
interview with designers of the Yahoo! YCSB benchmark which measured four different systems: Cassandra, HBase, PNUTS (Yahoo! cloud system) and and implementation of a sharded MySQL.
- SAS keeps its high-performance edge amidst converging architectures - May 26, 2011.
host Eric Kavanagh (of newly launched Inside Analysis) talks with SAS CTO about what's happening in the industry, where it's all heading, and what's new at SAS
- KDnuggets 11:n13, Top analytics/data mining tools; The *Decline* effect? - May 25, 2011.
Latest news on data mining and analytics, including Features (6) | Courses, Webcasts, Meetings (3) | Software (3) | Jobs (9) | Academic (2) | Competitions (6) | Publications (13) | News Briefs (7) | CFP (16)
- Brand new KNIME Press announces first eBook - May 21, 2011.
helps new KNIME user to learn through concise, hands-on examples and exercises how to produce practical results quickly.
- ReactiveSearch presents: A day in the (hard) life of Thomas - May 21, 2011.
Here is a funny video - a day in the life of data analyst - from makers of Grapheur Data Mining and Interactive Visualization tool.
- Data Mining Research Blog celebrating five years ! - May 20, 2011.
Here are five interesting milestones along the way.
- Why you can't really anonymize your data - May 19, 2011.
The anonymization process is an illusion. There are now so many different public datasets to cross-reference, any set of records with a non-trivial amount of information on someone's actions has a good chance of matching identifiable public records.
- On CART and Cross-Validation, Data Mining - May 18, 2011.
Historic video: Richard Carson interviews CART founding fathers Leo Breiman, Jerome Friedman, Richard Olshen and Charles Stone on CART and Cross-Validation
- 2011 Data Scientist Summit Summary - May 14, 2011.
reflections on 2011 Data Scientist Summit from Ryan Rosario and David Smith
- Podcast: The 'Decline Effect' and Scientific Truth - May 14, 2011.
Surprising and exciting scientific findings capture our attention and captivate the press. But what if, at some point after a finding has been soundly established, it starts to disappear?
- Podcast: Two Cautionary Data Tales - May 14, 2011.
Data doesn't always expose and explain; it can also lead us astray. OTM producer Jamie York looks at two times in the recent past when an overreliance on data has had disastrous consequences.
- Podcast: The Personal Data Revolution - May 14, 2011.
the average person can now collect and analyze unprecedented amounts of data about themselves. What was once the province of extreme athletes and dieters has been democratized and the resulting movement is called 'The Quantified Self.'
- Podcast: Data Journalism - May 14, 2011.
The immense amounts of data collected by local, state and federal government agencies can be an incredibly valuable trove for enterprising journalists. It can also be a pointless slog.
- McKinsey: New Ways to Exploit Raw Data May Bring Surge of Innovation - May 13, 2011.
estimates the potential benefits from deploying data-harvesting technologies and skills, such as $300B value to health-care system, and increasing profit margins by 60% for American retailers.
- inSCIght Scientific Podcast: Kaggle, Competitions for Data Scientists - May 12, 2011.
The latest episode, "Hacking Education: crowd sourcing for the win!", with Kaggle CEO Anthony Goldbloom, discusses competitions for developers and data scientists.
- Data Mining Poll Data Over the Years - May 12, 2011.
Anne Milley investigates the KDnuggets Data Mining Tools Polls over the past 10 years. See what she finds and what happens to the ratio of commercial, open-source and own code.
- Grab Bag: Frequently-Asked Data Mining Questions and Answers - May 11, 2011.
Some of the best interactions from Tim Graettinger Q&A sessions at the end of his data mining "nuts and bolts" webinar.
- KDnuggets 11:n12, Data Mining Tools Poll; Largest dataset analyzed; KDD-2011 - May 11, 2011.
Latest news on data mining & analytics, including Features (12) | Courses (5) | Webcasts (3) | Software (3) | Jobs (14) | Academic (1) | Competitions (1) | Publications (5) | NewsBriefs (5) | CFP (19)
- Poll results: Largest dataset analyzed - May 10, 2011.
Globally, 21% of data miners worked with Terabyte or larger datasets, and 30% in the US/Canada. The median was in 10-20 GB range.
- Steve Miller on Data Science - May 7, 2011.
It almost seemed that data science suffered from "not invented here" syndrome. So I determined right there and then to further investigate the differences between BI and DS.
- The Silicon Jungle: - May 5, 2011.
fictionalized account of data mining and machine learning in today's largest internet companies, written for a general audience. The book focuses on data safety, scientific responsibility, and how data can be constructively used as well as misused when not handled carefully.
- Top Names vs Professions on LinkedIn - May 4, 2011.
LinkedIn Blog has an interesting analysis of top names for different professions.