KDnuggets™ News 11:n17, July 13
Features (7) |
Courses, Events (1) |
Webcasts (1) |
Software (4) |
Jobs (5) |
Competitions (3) |
Meetings (3) |
Publications (6) |
NewsBriefs (5) |
CFP (9) |
- New Poll: Text Analytics Use? - Jul 12, 2011.
How much did you use text analytics in the past 12 months and how much do you plan to use over the next 12 months? Please vote
- Poll results: Vacation length by region - Jul 12, 2011.
The European data miners have longest vacation of 4.3 weeks (avg), while Latin Americans have the shortest. Interestingly, academic researchers report shorter vacation time than ones in the industry.
- Predictive Analytics World October 2011 in New York - Jul 12, 2011.
Download the Conf. Preview Guide; IBM Watson, Jeopardy Champ Slayer: New Keynote from David A. Ferrucci, IBM Watson Project Leader.
- Most viewed items for Jul 3-9 - Jul 10, 2011.
New Book: Data Mining: Concepts ...; Distr. Evo. Algs Python library (DEAP);
Top jobs: Data Mining Analyst at Waterfront International, Toronto; Statistician/Algorithms Scientist at Guardian Analytics, Los Altos, CA;
- Most viewed news, jobs in June - Jul 8, 2011.
LexisNexis open-sources its Hadoop killer; What do Data Miners Need to Learn?
Top jobs: Scientist, Predictive Analytics at Disney Interactive; Data Analyst - Windows Live Intelligence at Microsoft
- Additions to KDnuggets Directory in June - Jul 8, 2011.
New datasets, education, meetings, publications, software, solutions, websites
- Most viewed items for Jun 26 - Jul 2 - Jul 5, 2011.
Supreme Court strikes ban on pharma data mining; Dark matter in a new light with competition;
Top jobs: Statistician/Algorithms Scientist at Guardian Analytics; Data Ops Engineer at WaPo Labs.
- Dice - A Random Algorithm Library for Data Mining - Jul 12, 2011.
supports Random Decision Tree, a fast and general algorithm for classification, ranking, regression and multiple-label classifications. The source code is in JAVA.
- Mulan: an open-source Java library for learning from multi-label data - Jul 9, 2011.
includes algorithms for Classification, Ranking, Feature Selection, Evaluation, and more. Each item of a multi-label dataset can be a member of multiple categories.
- DEAP: a library for Distributed Evolutionary Algorithms in Python - Jul 7, 2011.
EAP uses both the object oriented and functional programming paradigm in Python to make development simple and beautiful. DTM provides tools to distribute workload evenly on a cluster or LAN of workstations, based on MPI and TCP communication managers.
- cloudnumbers.com offers instant access to High Performance Computing - Jul 5, 2011.
provides an intuitive platform that enables everyone to run his or her time-consuming analysis on a cluster with more than 1000 CPUs.
- Data Scientist - Performance Analyst at Collective, New York, NY - Jul 12, 2011.
Perform rigorous analyses of data quality, business rules effectiveness and performance potential for the buy side (advertiser), sell side (publisher, exchange) and audience (data provider) dimensions of our business
- Software Engineer, Data Analytics/Modeling at Guardian Analytics, Los Altos, CA - Jul 11, 2011.
Excel at understanding and manipulating large amounts of data, and have strong SW skills. Derive satisfaction from puzzling out complex data problems. Be enthusiastic about preventing Internet banking fraud
- Computational Biologist - Proteomics Data Analyst - Analytics - RnD at Dow Agro, Indianapolis, IN - Jul 10, 2011.
to assist in the design of experiments, integration and analyses of biological datasets. The candidate will interact with scientists across Dow AgroSciences R&D.
- Data Mining Analyst at Waterfront International, Toronto, Canada - Jul 6, 2011.
WIL is a financial consulting firm, specializing in developing computer based statistical trading strategies.
- Senior Analyst, User Engagement & Monetization Experimentation at Demand Media, Santa Monica, CA - Jun 30, 2011.
Apply data mining and statistics to measure and understand the search user experience and feature engagement. Design and implement statistical experiments to maximize site performance and profit as well as minimize and manage risk of revenue loss.
- Wikipedia for Kaggle Participants - Jul 9, 2011.
Tips from Wikipedia editor/admin on how to best analyze Wikipedia data for the ICDM 2011 Kaggle data-mining challenge: use data from 10 years of Wikipedia edits in order to predict future edit rates.
- ICDM 2011 Data Mining Contest - Jun 30, 2011.
The challenge is to develop methods that can predict future editing activity on Wikipedia. Prizes for top finishers total 10K.
- Competition Shines Light on Dark Matter - Jun 29, 2011.
in less than a week, a PhD student created an algorithm that outperformed the state-of-the-art algorithms for mapping dark matter.
- Boston Predictive Analytics Meetup - Jul 11, 2011.
Covering business applications, Web Analytics, R, Recommender Systems, Machine Learning, Google Analytics, Data Visualization, Social Media / Text Analytics, and related topics.
- KDD 2011 (Aug 21-24, San Diego) Program - Jul 11, 2011.
state-of-the-art research, invited talks from industry and academic leaders (including Peter Norvig, David Haussler, and Judea Pearl), tutorials on social media analytics, Internet ad systems, Hadoop, and much more. Register today!
- Data Mining Meetup, San Francisco area - Jul 10, 2011.
host and highlight events on a wide range of data mining topics
- Gartner Market Share Analysis for BI and Analytics - Jul 10, 2011.
Total BI Market in 2010 was estimated at $10.5B. BI platforms made up 63.7% of BI software revenue, while CPM suites and analytic applications accounted for 20.6% and 15.7% of the total revenue, respectively.
- Ben Shneiderman talk on Information Visualization for Knowledge Discovery - Jul 9, 2011.
this 1-hr talk was part of Distinguished Lecture Series at UMD.
- New Books: Music Data Mining, more - KDnuggets discount - Jul 8, 2011.
20% off on Music Data Mining; A First Course in Machine Learning; Machine Learning and Knowledge Discovery for Engineering Systems Health Management from CRC Press
- New Book: Data Mining: Concepts, Models and Techniques - Jul 5, 2011.
The goal of this book is to provide, in a friendly way, both theoretical concepts and, especially, practical techniques of this exciting field, ready to be applied in real-world situations.
- Gartner: Pattern-Based Strategy - Jun 30, 2011.
As information sources continue to grow, Gartner report outlines three topics for matching technology with business interest: volume, variety, and velocity.
- New Book: Machine Learning for Law Enforcement, Security, Intelligence - Jun 29, 2011.
a guide for doing forensic investigations using neural networks, text extraction, and rules to interrogate the evidence for fraud detection, Cybersecurity, and intelligence.
- Sequilab a virtual research lab for life scientists doing sequence analysis. - Jul 11, 2011.
a sequence-profiling portal with web 2.0 interactivity features which lets user directly link NCBI-BLAST results instantly to an inventory of sequence analysis tools
- Datameer and MapR Partner to Promote Big Data Analytics - Jul 8, 2011.
The partnership will focus on Hadoop-based analytics, and will a variety of joint technology and marketing efforts
- IDC: World's data to grow 50-fold by 2020 - Jul 8, 2011.
Much more data is created about people's activities than they create directly via email, photos, and downloading. In 2011, about 1.8 zettabytes (1.8T gigabytes) of data will be created.
- Global BI tools market generates over $4bn revenue in second half 2010 - Jul 6, 2011.
The top 2 BI tools vendors for the full year were SAP and IBM, both accounting for more than $1bn each in software revenue during 2010.
- Twitter Acquires Social Analytics Company BackType - Jul 6, 2011.
Team, IP will move to the Twitter platform group. BackType started as a comment tracker but to a turned social analytics platform
CFP - Calls for Papers
- Stream Data Mining 2011: Special Session on Stream Data Mining, due Jul 15
- MMIS-11 : The Fifth International Workshop on Mining Multiple Information Sources , due Jul 23
- DDDM 2011 : Domain-Driven Data Mining, due Jul 23
- DMCS 2011 : Data Mining Case Studies and Data Mining Practice Prize, due Jul 23
- PAKDD 2012 workshops: The 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining, workshop proposals, due Aug 28
- PAKDD 2012: The 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining, due Sep 25
- ECG 2012: 12th Francophone Int. Conf. on Knowledge Discovery and Management (call for papers, workshops, software demos), due Oct 7
- ECG 2012 workshops: 12th Francophone Int. Conf. on Knowledge Discovery and Management Workshops, due Oct 14
- PAKDD 2012 tutorials: The 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining, tutorial proposals, due Nov 13
"In the scientific study of random processes, the drunkard's walk is the archetype ... we're continually nudged in this direction and then that one by random events. As a result, although statistical regularities can be found in social data, the future of particular individuals is impossible to predict."
thanks to Steve Miller in Information Management