Features
- Poll: Analytics/Data Mining Meetings you attended in 2010? - Jan 4, 2011.Please vote on www.kdnuggets.com
- Additions to KDnuggets Directory in December - Jan 4, 2011.Additions to KDnuggets in Blogs, Companies, Competitions, Datasets, Meetings, Publications, Software, and Websites sections
- Access PAW DC Session Videos Now - Jan 4, 2011.on-demand access to the videos of PAW Washington DC, October 2010, including over 30 sessions and keynotes that you may view at your convenience.
- The Truth Wears Off - Dec 29, 2010.Many rigourously proved scientific results start shrinking in later studies. What went wrong? (My guess - widespread data overfitting and confirmation bias).
- Top Conferences in Data Mining - Jan 4, 2011.top conferences by citations are KDD, ICDE, IEEE ICDM, according to Microsoft Asia Academic Search.
- Most viewed items for Dec 26 - Jan 1 - Jan 3, 2011.Book: Mining of Massive Datasets (free download); New Book: Data Mining for Business Applications;
Top jobs: Sr. SW Engineer, Data Mining at Polyvore, Mountain View; Sr. Test Lead - Bing at Microsoft. - Most viewed items for Dec 19-25 - Dec 27, 2010.Book: Mining of Massive Datasets (free download); $3M Heritage Health Data Analysis Prize;
Top jobs: Software/Algorithms Engineer at Efficient Frontier, Sunnyvale, CA; Research Scientist in Large Scale Data Analytics at Ricoh Innovations, Menlo Park, CA;
Courses (see also All Courses)
- Learn How Experts Mine Data, Feb 14-18, Orlando - Jan 4, 2011.Register by Jan 17 and save! Learn how to leverage your existing data via an intensive course series. Feb 14-18 in Orlando.
Webcasts (see also All Webcasts)
- Data Mining: Failure to Launch [ Free Webinar ] - Jan 4, 2011.a recent industry survey reports that 51% of data mining projects either never left the ground or did not bring value. Attend this webinar to learn how to succeed with data mining.
Software (see also All Software)
- Award for Math Model that Relates Biomarkers of Asthma with Clinical Outcomes - Dec 28, 2010.The Seeker is looking for a collaboration partner to write a 2-4 page proposal for building a mathematical model that relates biomarkers of asthma with clinical outcomes.
Jobs (see also All Jobs)
- Predictive Modeling Analyst at RTI International, Research Triangle Park, NC - Jan 4, 2011.a seasoned mathematical modeler to work with other researchers on its Research Triangle Park campus to work on health related projects predominantly in the areas of substance abuse, violence, chronic diseases and HIV.
- Senior Software Engineer, Data Mining at Polyvore, Mountain View, CA - Dec 30, 2010.identify patterns in user behavior data that will drive key product initiatives and business decisions; bring algorithmic innovation that will dramatically improve our search quality, recommendations, and personalization, and build production systems on a massive scale.
- Senior Research Scientist/Engineer Web/ML/NLP at Bestofmedia, Grenoble, France - Dec 22, 2010.At the heart of the internal R&D department, you contribute to the innovation process in an integrated team of world-class scientists and engineers. You will work on advanced technology and science that underlie the Web of today and tomorrow.
- Software/Algorithms Engineer at Efficient Frontier, Sunnyvale, CA - Dec 21, 2010.with knowledge of statistical modeling concepts to develop and deploy large-scale online ad targeting systems.
Academic/Research positions
- 2011-2012 Herman Goldstine Memorial Fellowship in Mathematical Sciences at IBM T. J. Watson Research Center, Yorktown Heights, NY - Jan 2, 2011.The fellowship provides scientists of outstanding ability an opportunity to advance their scholarship as resident department members at the Research Center. Area of research include algorithms, data mining, statistics, dynamical systems, and more.
Audio/Video
- New and old Data stores - Jan 3, 2011.The panel discussed the pros and cons of new data stores with respect to classical relational databases. Panel was held at ICOODB Frankfurt on September 29, 2010.
- Predictive analytics videos from Zementis - Dec 30, 2010.First video shows you how easy it is to deploy and execute a predictive model in ADAPA. Second video focuses on PMML, open standard for representing data mining models.
- Graph Identification and Privacy in Social Networks - Dec 24, 2010.Lise Getoor talk at Google looks at methods that extract graphs from noisy, input data. Results show that on several well-known social media sites, one can recover sensitive information.
Publications
- A Data Mining Method for Moderating Outliers, Instead of Discarding Them - Jan 4, 2011.the statistical community has not addressed uniting the outlier-detection methodology and the "reason for the existence" of the outlier.
- Open Data: Why the Crowd Can Be Your Best Analytics Tool - Jan 3, 2011.From the bounty of data emerged "data science" and a plethora of new tools to deal with the size and speed of information. Today we are seeing crowdsourcing increasingly commoditize data, and projects like OpenStreetMap replacing the NAVTEQs of the world.
- AI Defeats the Hivemind - Jan 2, 2011.How a machine learning algorithm (Naive Bayes) beat the assembled masses of Mechanical Turk.
- New Book: Data Mining for Business Applications - Dec 29, 2010.This book contains extended versions of workshop papers from 2005 to 2008 on data mining for business applications. Areas covered include methodological issues and research challenges, typical problems, and the emerging applications.
- Exploring Twitter Hashtags - Dec 29, 2010.Using a dataset of 29 million messages, Jan Poeschko explores relations among the hashtags with respect to co-occurrences. He classifies hashtags into five intuitive classes, using a machine-learning approach.
- Xindong Wu: 10 Years of Data Mining Research (ICDM'10 Keynote) - Dec 23, 2010.ICDM'10 keynote reviewed past activities, discussed current achievements, and presented research challenges for the future.
- Book: Reactive Business Intelligence - Dec 22, 2010.Combining data mining, modelling and visualization (based on authors' Grapheur software) this book would be of interest to analytic professionals.
- Much Faster Bootstraps Using SAS® - Dec 22, 2010.We compare 7 bootstrap algorithms in SAS; our best one is ~80x faster than the built-in SAS procedure (Proc SurveySelect).
- Wanted: Data Scientists to Turn Information Into Gold - Dec 22, 2010.there was a 200 percent increase from 2008 to today in searches for executives with sophisticated data mining or data analytics capabilities.
News Briefs
- Data mining: How coaches gauge success by glancing at a box score - Jan 3, 2011.The coach looks at a few variables on the stat sheet after each contest. Those numbers routinely spell out whether his team's performance was aligned with his basketball philosophy.
- DiscoverText web-based Analytic Software - special offer for Educators - Jan 3, 2011.Colleges and Universities can purchase a 1-year Educational Enterprise License for DiscoverText for only $999.
- SocialFlow Tries to Crack Science of Twitter - Jan 2, 2011.The New York-based startup claims to have cracked the puzzle of real-time conversation, and boasts that it can take the guesswork out of what to say and when to say it on Twitter, Facebook and Google.
- Polyvore Launches Style Analytics - Dec 31, 2010.measures key trends, the popularity of brands and the amount of engagement users have with fashion brands.
- McDonald's, CBS, & Microsoft Mine Data from Web Ads, Class Claims - Dec 29, 2010.Defendants acted in concert with Interclick, mining consumers' web browser histories for entries of particular relevance to defendants' respective, customized advertising campaigns, the complaint states.
- Your Tweets Could be Worth Millions ... or not - Dec 28, 2010.Trading stocks based on data collected through Twitter is either sheer genius or abject stupidity.
- Government Seeks Predictive-Modeling Apps to Fight Fraud in Medicare, Medicaid - Dec 25, 2010.Federal government will implement predictive-modeling software to fight fraudulent claims for Medicare and Medicaid and the Children's Health Insurance program.
- SAS Institute brings 100 jobs to Cary - Dec 24, 2010.The positions will focus on provide data analysis for state and local government groups that are deciding budgets for 2011.
- Microsoft's Dryad technology to take on Google's MapReduce - Dec 24, 2010.developed by Microsoft research, Dryad is a platform for running programs across multiple servers.
- Computers That Trade on the News - Dec 23, 2010.Math-loving traders are using powerful computers to speed-read news reports, editorials, company Web sites, blog posts and even Twitter messages - and then letting the machines decide what it all means for the markets.
CFP - Calls for Papers (see also All CFP)
- ICML-2011 , due Jan 14
- Workshop on Behavior Informatics at PAKDD2011, due Jan 15
- AgentMining, due Jan 15
- Agents and Data Mining Interaction, due Jan 30
- KDD-2011: Knowledge Discovery and Data Mining Workshop proposals, due Feb 1
- KDD-2011: Knowledge Discovery and Data Mining, due Feb 11
Quote
Scientists have misled themselves into thinking that if you collect enormous amounts of data you are bound to get the right answer. You are not bound to get the right answer unless you are enormously smart. You can narrow down your questions; but enormous data sets often consist of enormous numbers of small sets of data, none of which by themselves are enough to solve the thing you are interested in, and they fit together in some complicated way.Bradley Efron
(Thanks to Tom Fawcett)