Features
- New Poll: Is anonymization of large datasets still possible? - Mar 23, 2010.Recently, Netflix cancelled the 2nd Netflix Prize due to privacy concerns. With so much personal information online, do you think that it is still possible for companies like Netflix to anonymize and release large datasets (for research and competitions)?
- Poll Results: Sports Analytics Are Useful - Mar 23, 2010.Sports analytics merely increase the likelihood of (yet do not guarantee) making a more educated, more informed decision.
- KDD Cup 2010: Educational Data Mining - Mar 23, 2010.This year's challenge asks you to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems.
- KDD-2010 Workshops - calls for papers - Mar 22, 2010.KDD-2010 will have 4 full day workshops: Mining and Learning with Graphs; Large-scale Data Mining; Useful Patterns; Social Media Analytics; and 5 half day ones: KDD Cup 2010; BIOKDD10; MDMKDD 2010; ADKDD'10; Human Computation
- Next Neflix Prize cancelled due to privacy concerns - Mar 22, 2010.After FTC expressed concerns about Netflix members privacy and a lawsuit was filed pertaing to the sequel, Netflix decided to cancel the Netflix Prize sequel
- Most viewed items for week Mar 14-20 - Mar 21, 2010.Top news: Clarabridge Self Service Text Analytics; Stanford online graduate education;; Poll Results: Data Miner Salary by Region. Top jobs: Data Mining Engineer at eBay; Scientist at ID Analytics
- Most viewed items for week Mar 7-13 - Mar 14, 2010.News: Stanford University online graduate education; Poll Results: Data Miner Salary by Region; Jobs: Statistician / Data Analyst at LoopNet, San Francisco, CA
- Predictive Analytics World: Save-the-Date and Call-for-Speakers - Mar 9, 2010.Save-the-date for the next PAW: Oct 19-20, 2010 in Washington DC; Speaker proposals deadline: April 16, 2010
Webcasts (see also All Webcasts)
- Kxen Webinar: Social Network Analysis, Apr 1 - Mar 18, 2010.Social Network Analysis can Boost your Marketing Performance! Your customers are telling you how to market more effectively - but are you listening?
Software (see also All Software)
- Data Applied new data mining and visualization capabilities - Mar 23, 2010.Founded by ex-Microsoft engineers, the company leverages Silverlight technology and a web-based API to bring data mining within reach of any web-enabled user or application
- 3rd Annual Rexer Analytics Data Miner Survey - Summary - Mar 19, 2010.Most commonly used algorithms are regression, decision trees, and cluster analysis. The top challenges facing data miners are dirty data, explaining data mining to others, and difficult access to data. Users of IBM SPSS Modeler, Statistica, and Rapid Miner are the most satisfied with their software.
- Weka User Survey - Win Amazon.com Gift Card - Mar 11, 2010.U. of Waikato, the home of Weka, is interested to learn more about the people who use Weka Machine Learning Software. Do this 4-minute survey and you can win a $200 Amazon gift card.
- Two social-bookmarking studies: recommendation and tag-based ranking - Mar 22, 2010.The GiveALink.org project invites you to participate in two online user studies related to tag recommendation and tag-based ranking.
- Pyriel learns classification rules which maximize AUC (open-source) - Mar 18, 2010.I previously published a paper "PRIE: A system for generating rulelists to maximize ROC performance" (Data Mining and Knowledge Discovery, October, 2008). I have just released Pyriel, an open-source implementation of this system.
- Centrifuge Releases New Data Visualization Software - Mar 18, 2010.The Centrifuge approach to data visualization brings together three innovations in analysis: Interactive Data Visualization, Unified Data Views and Collaborative Analysis to identify important insights and hidden patterns in your data.
- Google Public Data Explorer - Mar 11, 2010.The new Google Labs tool offers a visual way to look at and analyze large public data sets on a variety of popular search topics.
Jobs (see also All Jobs)
- Analytics Engineer at Eventbrite, San Francisco, CA - Mar 23, 2010.We are looking for a talented software engineer with a solid background in data mining, and knowledge of Internet commerce and social networks.
- Scientist at ID Analytics, San Diego, CA - Mar 11, 2010.part of a team responsible for the development of the company's advanced technologies particularly to do with statistical score modeling, score model development, score model deployment, large scale database analysis and statistical algorithms.
- Senior Data Engineer at ID Analytics, San Diego, CA - Mar 11, 2010.responsible for the extraction, verification, processing, cleansing, analysis and deployment of client data, third-party data sources, and internal data sources.
- Data Mining Engineer at eBay, San Jose, CA - Mar 10, 2010.TnS applications proactively prevent fraud, catch fraud, enforce eBay policies, as well as collect & mine data that will help build future Trust and Safety strategies. We build real time machine learning applications processing 100s of millions of transactions a day, learning from terabytes of historical data.
Academic/Research positions
- 2 PhD positions in Stream/Sequence Mining at Southern Methodist University, Dallas, TX - Mar 10, 2010.participate in the group's research activities centered around data stream mining, genetic sequence mining and hurricane forecasting
Publications
- Why The Next Big Thing Is, In Fact, A Really Big Thing - Mar 21, 2010.In my view, big data is the next big thing. I identified five net new possibilities that big data presents: 1. Answer formerly unanswerable questions; 2. New questions; ...
- Moving On With Analytics - Mar 16, 2010.But about that first book, how did it hold up over time? Many speaking engagements later, Davenport sounded just a bit deflated at the overall progress, but not very much surprised.
- We're so good at medical studies that most of them are wrong - Mar 16, 2010.A survey of the recent medical literature found that 95 percent of the results of observational studies on human health had failed replication when tested using a rigorous, double blind trial. Given massive data sets and ability perform multiple tests, many researchers fall into trap of finding "significant" results which are due to random chance.
- Norman Nie: Open Source is Opening Data to Predictive Analytics - Mar 13, 2010.Revolutions in science have often been preceded by revolutions in measurement. Just as the microscope transformed biology by exposing germs, and the electron microscope changed physics, all these data are turning the social sciences upside down.
News Briefs
- Google Analytics to allow users to opt out - Mar 23, 2010.Google is developing a global browser based plug-in to allow users to opt out of being tracked by Google Analytics.
- Former Yahoo execs launch nPario analytics agency - Mar 23, 2010.The company wants to help clients better understand and market consumer commercial intent through optimal data management and data mining products and services.
- Wendy's Selects Clarabridge Text Analytics for for Customer Feedback - Mar 22, 2010.Wendy's Customer Feedback Program will use Clarabridge's solution to automatically collect and analyze, in real-time, close to 500,000 text-based customer comments per year
- Pegasystems buys CRM vendor Chordiant - Mar 22, 2010.Pegasystems plans to buy Chordiant adding CRM (customer relationship management) and analytics capabilities to its BPM (business process management)
- Telligent Announces Enhancements to Analytics Software - Mar 22, 2010.Telligent Analytics is a social analytics program to measure user engagement through several channels, including Web analytics layered over deep social analytics, identification of key users, and customizable conversation analysis.
- Patent: Retail Data Mining Using Co-Occurrence Consistency - Mar 22, 2010.The invention uses technologies from statistics, information theory, and graph theory to quantify and discover patterns in relationships between entities, such as products and customers, as evidenced by purchase behavior.
- Saffron Technology Launches Saffron Natural Intelligence Platform Version 8.0 - Mar 22, 2010.Enterprise-Scalable Associative Memory Store is First Streaming Data Analytics & Experience Management Solution for Business
- Foursquare Analytics for Businesses - Mar 21, 2010.Foursquare, a location-based social network, plans to distribute a free analytics tool and dashboard for businesses with info and statistics about their visitors
- Turiya Media targets game publishers with behavioral data mining - Mar 21, 2010.Turiya Media Leafnode product helps game publishers better retain and monetize their customers through mining and analyzing behavioral data.
- Sports teams, leagues tap SAS� Analytics to boost profits - Mar 18, 2010.San Francisco 49ers, MLB.com, Jacksonville Jaguars, Carolina Hurricanes sign SAS�
- New Free Data Mining Tool for Game Developers - Mar 14, 2010.Game developers have a new tool in their arsenal courtesy of Jenkin's Software, an instrumenting and data analysis suite called RakNet.
- TIBCO rolls out Spotfire 3.1 with spotlight on predictive analytics - Mar 12, 2010.Spotfire 3.1 allows any enterprise decision-maker to perform "what-if" scenarios on demand and find new insights in complex data
CFP - Calls for Papers (see also All CFP)
- Machine Learning for Signal Processing, due Apr 1
- Intelligent Learning Systems in Banking and Finance , due Apr 10
- Int. Conf. on Data Engineering and Management, due Apr 20
- ECML PKDD 2010, due Apr 23
- KDD-10 workshop: Human Computation, due May 3
- KDD-10 workshop: Large-scale Data Mining: Theory and Applications, due May 4
- BIOKDD10, due May 4
- KDD-10 workshop: Useful Patterns, due May 4
- KDD-10 workshop: Mining and Learning with Graphs, due May 7
- KDD-10 workshop: Mining and Learning with Graphs, due May 7
- KDD-10 workshop: Social Media Analytics, due May 7
- ICML-2010 Workshop on Budgeted Learning, due May 8
- KDD-10 workshop: Audience Intelligence for Online Advertising, due May 10
- ECML PKDD 2010 Demos, due May 14
- Data Mining for a Sustainable World, due May 21
- Future Internet and Society: A Complex Systems Perspective, due Jun 1
- KDD CUP 2010 workshop, due Jun 1
- SIGSPATIAL: Geographic Information Systems, due Jun 17
- Computational Intelligence in Knowledge Discovery, due Jun 26
Quote
Empirical evidence is that 80-90% of the claims made by epidemiologists are false; their claims do not replicate when retested under rigorous conditions. The net effect of ignoring multiple testing is to exploit randomness.S. Stanley Young, National Institute of Statistical Sciences