KDnuggets™ News 14:n12, May 21
Features (11) | Software (4) | Opinions (9) | News (7) | Webcasts (1) | Courses (3) | Meetings and Reports (10) | Jobs (14) | Publications (11) | Tweets (6) | CFP (25) | Quote
Features
- New Poll: Analytics, Data Mining, Data Science Software Used? - May 20, 2014.
Please vote in our well-known annual KDnuggets Software Poll: What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project?
- Exclusive: Tamr at the New Frontier of Big Data Curation - May 19, 2014.
Our exclusive profile of Tamr (former Data Tamer), the latest startup from legendary Michael Stonebraker, which emerged from stealth mode to address the new field of Big Data Curation.
- Guide to Data Science Cheat Sheets - May 12, 2014.
Selection of the most useful Data Science cheat sheets, covering SQL, Python (including NumPy, SciPy and Pandas), R (including Regression, Time Series, Data Mining), MATLAB, and more.
- Top 100 Startup Experts to Follow on Twitter - May 17, 2014.
A list of Top 100 Startup Experts to Follow on Twitter is headed by @kdnuggets. Check our tweets on Analytics, Big Data, Data Mining, and Data Science startups and acquisitions under hashtag #BigDataCo.
- Cartoon: Data Visualization meets 3-D Printer - May 11, 2014.
New KDnuggets Cartoon looks at what happens when Data Visualization meets 3-D Printer.
- MassTLC Big Data Meeting Delivers Insights, Perspective - May 8, 2014.
Summit highlights: Digitization and Datification - a love story, Strategies for creating a competitive advantage in #BigData world, Boston open data, Balancing privacy and governance, and the most widely used #BigData tool in the future.
- Stacking the Deck: The Next Wave of Opportunity in Big Data - May 20, 2014.
A leading venture capitalist explains why Big Data infrastructure market is mostly mature and where lies the next big area of opportunities related to Big Data.
- Code for India 2014 Global Hack-a-thon - Building a Better India through Innovative Solutions - May 19, 2014.
Non-stop 24 hours of coding at the Code for India 2014 hackathon leads to creative solutions for major social problems of India through interesting software applications.
- Poll Results: Data Types/Sources Analyzed - May 17, 2014.
Trends in data sources for data mining include: table data dominates, followed by time series and text; audio, JSON grows in popularity, while itemsets decline; 70% access DB engines, but only 20% access NoSQL stores; Hadoop, MongoDB used more for text; Europe is lagging in NoSQL usage.
- PAW: Predictive Analytics World Chicago, Expert-led Workshops - May 20, 2014.
Discover best practices and sharpen your skills by attending one of our expert-led workshops: The Best and Worst of Predictive Analytics, R for Predictive Modeling - Hands-on, and Advanced Methods Hands-on.
- Watch: Basics of Machine Learning - May 14, 2014.
Watch series on machine learning, going from basics like Naive Bayes, Decision Tree, Generalization and Overfitting, to more complex topics like Hierarchical Agglomerative Clustering.
Software
- BabelNet 2.5: Very Large Multilingual Encyclopedic Dictionary and Semantic Network - May 19, 2014.
BabelNet 2.5 covers 50 languages, and offers seamless integration of WordNet, Open Multilingual WordNet, Wikipedia, OmegaWiki, Wikidata (NEW), and Wiktionary (NEW). Check upcoming BabelNet workshops.
- Uppd8: An Engine for the Wisdom of Crowds - May 15, 2014.
What people think matters. Uppd8 focuses on crowd sentiment analysis and provides tag-scored data based on different user types. Basic services will be provided for free.
- ClearStory - The Fastest, Simplest Way to Analyze Data - May 8, 2014.
ClearStory is modernizing how diverse data is accessed, merged, and analyzed, and how insights are consumed by analysts and business users. Try ClearStory today.
- Spotlight: RapidMiner New Predictive Analytics Platform-as-a-Service - May 7, 2014.
We examine the newly announced RapidMiner Platform-as-a-Service, installed on AWS and managed by RapidMiner experts.
Opinions and Interviews
- Interview: Dale Russell, CTO, Talksum on Winning the IE Big Data Startup Award - May 20, 2014.
We discuss Talksum data stream router and cross-domain networking with real-time data management using data streams.
- Exclusive Interview: Michael O'Connell, Chief Data Scientist, TIBCO on How to Lead in Big Data - May 19, 2014.
We discuss Big Data vs. Fast Data, Data Visualization trends, Jaspersoft acquisition, factors differentiating future leaders of Big Data and more.
- Interview: Gary Shorter, Quintiles on Future of Heathcare and Big Data Skills - May 16, 2014.
We discuss how Big Data is shaping the future of Healthcare industry and advice for career in Analytics.
- Interview: Gary Shorter, Director of Data Science, Quintiles on Big Data for Healthcare - May 15, 2014.
We discuss the rising medical costs, how can Big Data help, key features of Quintiles Inforsario and Topological Data Analysis.
- Interview: Prateek Jain, Director of Engineering, eHarmony on Fast Search and Sharding - May 14, 2014.
We discuss Big Data architecture, fast multi-attribute searches, database sharding and scaling challenges at eHarmony.
- Interview: George Corugedo, CTO, RedPoint on Big Data Trends and Important Skills - May 13, 2014.
We discuss the key trends in Big Data industry, important skills for Data Science practioners and more.
- Interview: George Corugedo, CTO, RedPoint on YARN and Customer Analytics - May 12, 2014.
We discuss significance of YARN for Hadoop 2.0 platform, unique benefits of RedPoint Convergent Marketing Platform and Master Key Management for Customer Analytics.
- Interview: Arijit Sengupta, CEO, BeyondCore on Advanced Analytics and Big Data - May 9, 2014.
We discuss traditional analytics vs. modern analytics, avoiding over-simplification, human-technology interaction for Big Data, challenges in democratizing analytics and more.
- Interview: Xinghua Lou (Microsoft) on Mining Clinical Notes and Big Data in Healthcare - May 7, 2014.
We discuss data mining of cancer clinical data, LDA topic model, challenges in mining clinical notes, big data in healthcare and more.
News
- Top stories for May 11-17 - May 19, 2014.
Guide to Data Science Cheat Sheets; Watch: Basics of Machine Learning; Cartoon: Data Visualization meets 3-D Printer; Social Media and Web Analytics Innovation Summit 2014 Highlights.
- Top stories for May 4-10 - May 11, 2014.
Data Scientists Not Required with Alteryx Analytics 9.0; 9 Free Books for Learning Data Mining and Data Analysis; Exclusive Interview: Todd Holloway, Data Science Lead, Trulia; Did Target really predict teen pregnancy?
- Predict Soccer World Cup 2014 Winner, Get Prizes from RapidMiner - May 16, 2014.
Use a free edition of RapidMiner to have fun and bring sports predictions to another level by making a prediction of Soccer (Futbol) World Cup 2014, which starts on June 12 in Brazil.
- KDD Cup 2014 - Predicting Excitement at DonorsChoose.org - May 16, 2014.
Predict which Donor Choose projects will be exciting. 2014 edition of KDD Cup, the first data mining competition, is on Kaggle. Submissions due June 15.
- NineSigma Big Data Analytics RFP - May 9, 2014.
NineSigma is seeking proposals for mining user browsing/operations history, social networking services, and sensing devices to improve personalization and recommendation of products. Submit by May 23, 2014.
- April 2014 Analytics, Big Data, Data Mining Acquisitions and Startups Activity - May 8, 2014.
April 2014 acquisitions, startups, and company activity in Analytics, Big Data, Data Mining, and Data Science: Experfy, Dunnhumby, NexGraph, Fundbox, FICO, Gnip, Fliptop, InBloom, Jaspersoft, and more.
- Did Target Really Predict a Teen's Pregnancy? The Inside Story - May 7, 2014.
We examine the origin and the facts behind this explosive story, the importance of headlines, and how unsubstantiated assumptions gain traction and mainstream attention and help create myths around Predictive Analytics.
Webcasts and Webinars
- Upcoming Webcasts on Analytics, Big Data, Data Science - May 19 and beyond - May 19, 2014.
Data Mining: FTL; Deep Learning with H2O; Purchase history to Customer Projects; Apache Hadoop, Hive, Kafka, Solr; Python for Big Data Analytics, and more.
Courses
- Northwestern Online MS in Predictive Analytics - May 15, 2014.
Prepare for leadership-level career, learn from top faculty and industry experts, and earn your analytics degree online. Fall Quarter application deadline July 15.
- Vendor-Neutral Hands-On Training in Data Mining [Denver-CO, July | Wash-DC, Sep] - May 14, 2014.
Successful analytics in the big data era does not start with data and software, but with immersive hands-on training and goal-driven strategy. Get this training from The Modeling Agency.
- BPDM 2014: Broadening Participation in Data Mining Program - May 14, 2014.
The BPDM Program, held at KDD-2014, aims to foster mentorship, guidance, and connections of minority and underrepresented groups in Data Mining/Data Science by providing scholarships to interact with and learn from senior researchers. Apply by June 12.
Meetings and Reports
- PAW: Predictive Analytics World Chicago, Expert-led Workshops - May 20, 2014.
Discover best practices and sharpen your skills by attending one of our expert-led workshops: The Best and Worst of Predictive Analytics, R for Predictive Modeling - Hands-on, and Advanced Methods Hands-on.
- Resource-aware Machine Learning – Summer School 2014, Germany - May 16, 2014.
Summer school in Dortmund, Germany covers Machine Learning with Constrained Resources including topics like detecting astro particles using smartphones. Applications are due by June 30.
- Do Analytics as well as Google, Johnson&Johnson, and AT&T - May 13, 2014.
Attend Useful Business Analytics Summit in Boston and learn how the leading companies do analytics. Early reg by May 16.
- KDD 2014 Workshops – the latest in Data Mining and Data Science Research - May 12, 2014.
KDD 2014 workshops are the forum for the latest data mining and data science research. Workshop topics include Data Science for Social Good, Big Data Discovery and Curation, Big Data Analytics for Bio/Health Informatics, Stream Mining, Data Ethics, Sports Analytics, and more. Submission dates from late May to late June.
- Sentiment Analysis Innovation Summit 2014: Day 2 Highlights - May 15, 2014.
Highlights from the presentations by opinion mining experts from Fujitsu, FindiLike and Stanford University on Day 2 of Sentiment Analysis Innovation Summit 2014 in San Francisco.
- Sentiment Analysis Innovation Summit 2014: Day 1 Highlights - May 14, 2014.
Highlights from the presentations by opinion mining experts from Twitter, eBay and Samsung on Day 1 of Sentiment Analysis Innovation Summit 2014 in San Francisco.
- Social Media and Web Analytics Innovation Summit 2014: Day 2 Highlights - May 15, 2014.
Highlights from the presentations by analytics experts from Youtube, Evernote and Wikia on day 2 of Social Media & Web Analytics Innovation Summit 2014 in San Francisco.
- Social Media and Web Analytics Innovation Summit 2014: Day 1 Highlights - May 14, 2014.
Highlights from the presentations by experts from Google, CapitalOne, StubHub and Social Media Research Foundation on day 1 of Social Media & Web Analytics Innovation Summit 2014 in San Francisco.
- Big Data BootCamp Santa Clara: Highlights of talks on Days 1-2 - May 9, 2014.
Highlights from the presentations by big data technology practitioners from Caspida, Datastax, ElephantScale, Hortonworks, MapR and Qubole at Big Data Bootcamp 2014 in Santa Clara.
- Big Data BootCamp: Highlights of talks on Day 3 - May 12, 2014.
Highlights from the presentations by big data technology practitioners from Hortonworks, Intel, Rackspace, SciSpike, and Yahoo at Big Data Bootcamp 2014 in Santa Clara.
Jobs
- Apple: Data Mining Scientist - May 19, 2014.
Outstanding data mining scientist, help design, develop, and field data mining solutions that have direct and measurable impact to Apple.
- RichRelevance: Marketing and Merchandising Analyst - May 17, 2014.
Data integration, report automation, and ad hoc analysis using RichRelevance multi-petabyte database containing event-level data on hundreds of millions of shoppers and millions of products.
- FirstFuel: Data Scientist - May 16, 2014.
FirstFuel Software is using energy analytics to help utilities and government agencies deliver scalable energy efficiency. Data Scientist will develop state of the art Statistical/Machine Learning algorithms and deploy them to a scalable, secure, cloud-based architecture.
- Best Practice Partners: Principal Consultant, Healthcare Economics - May 16, 2014.
Lead the development of the client healthcare economics and risk adjustment services to assist in validating and ensuring compliance and accuracy of high-risk clinical profiles.
- Amazon, Customer Segmentation and Targeting: Machine Learning/Research Scientists (All Levels) - May 14, 2014.
Work on the world's richest collection of online shopping and in-device data to segment and target customers via email, social, mobile and display to support sales, advertising and loyalty applications.
- Akanoo: Sr. Software Engineer - Data Mining Scientist - May 13, 2014.
Explore a dynamic, young business, work alongside bright minds, research and develop predictive models, and optimize algorithms for behavioral on-site targeting.
- eSpark: Data Scientist (Make an impact on a social problem) - May 10, 2014.
Help build the next generation of tools to help students succeed in school and in life. Using data and teacher insight, we curate the best learning resources to create a personalized experience for each student.
- Large Investment Firm: Data Scientist / Statistician / Programmer - May 9, 2014.
Use both analytical and creative approaches to solve unstructured questions related to quantifying the risks and drivers of growth of publicly traded companies.
- Macy's: Director of Finance and Loyalty Analysis - May 9, 2014.
Establishing measurements and analytics to evaluate the impact of Loyalty initiatives on both credit and third party customers, developing strategies to report and analyze customer behavior.
- Amazon (Europe): Business Analysts - May 9, 2014.
Support different Amazon businesses, develop innovative solutions for growth, diverse work with opportunity to grow in technical and non-technical directions. Relocation and visa support.
- Enova: Sr. Data Scientist - May 8, 2014.
Responsible for automating current processes and researching systems in data management while supporting the Credit Risk Analytics, Business Analytics, Fraud Analytics and Marketing Analytics teams.
- dunnhumby: Director, Advanced Analytics & Predictive Modeling - May 8, 2014.
Lead a team with 3 Directors with 20 analysts, modelers and production specialists involved with a balance of innovation and daily production to deliver targeted relevant offers to build customer loyalty and retention.
- JCA: Consultant, Development Expertise - May 7, 2014.
JCA provides strategic consulting to the world leading nonprofits and helps them work smarter. The Consultant provides overall direction to the project team, manages client relationship, and helps identify new business opportunities.
- CollegeBoard: Data Scientist - May 7, 2014.
Help make better decisions using existing data for clients like SAT and Advanced Placement, with projects like segmentation and targeting, analysis and interpretation of survey data, and data visualization.
Publications
- CIOReview Top 100 Most Promising Big Data Companies - May 17, 2014.
Top 100 Most Promising Big Data Companies according to CIO Review, from 7Segments to OpenBI to Zeta Interactive.
- Big Data Landscape, v 3.0, analyzed - May 15, 2014.
We analyze the Big Data Landscape and identify the most popular market segments in Analytics, Infrastructure, Applications, Open Source, and Data Sources categories. It is still early - only 4.5% of companies had exits.
- Has Predictive Analytics Crossed The Chasm? - May 15, 2014.
Recent study highlights the increasing market perception that Predictive Analytics leads to competitive advantage. The report also outlines current trends and challenges for Predictive Analytics.
- New Book: Analytics in a Big Data World - The Essential Guide to Data Science - May 13, 2014.
For organizations looking to enhance their capabilities via data analytics, this book is the go-to reference for applying Data Science to make the right business decisions.
- Media Industry Embracing Analytics for Innovation and Competitive Edge - May 13, 2014.
Survey results highlight the importance of Analytics capability in media industry and the consumer beliefs on privacy vs. personalization benefits.
- Data Analytics Handbook p. 3, Interviews with Research Leaders and Academics, Free Download - May 13, 2014.
Part 3 features interviews with research leaders and academics, including Hal Varian (Chief Economist, Google), Gregory Piatetsky (Editor, KDnuggets), and Analytics Thought Leader Tom Davenport (Professor, Babson College). Free download.
- Forrester Research: Transform Your Organization with Strong Data Management - May 13, 2014.
New Forrester Research report shows how to build a more elastic and flexible data management practice to meet the new data demands. Free download compliments of Lavastorm Analytics.
- Healthcare Analytics: Identifying Leaders and Key Trends - May 12, 2014.
We review recently released report on Healthcare perceptions towards BI/Analytics and share key insights into who is leading healthcare analytics in different categories and what are the key dominant trends.
- Data Mining for Statisticians - May 10, 2014.
New video series from Salford Systems presents an approach to data mining from a statistical point of view.
- White House Report on Big Data: Opportunities and Values - May 9, 2014.
We summarize the key findings in the recently released White House report on Big Data, highlight the key opportunities and concerns, and list the recommendations made to the President.
- JMP White Paper: Advantages of Bootstrap Forest for Yield Analysis - May 7, 2014.
This white paper highlights practical examples on how to use partitioning techniques for semiconductor manufacturing data. These methods also have wider applicability.
Top Tweets
- Top KDnuggets tweets, May 16-18 - May 19, 2014.
Great find! Intro. to Data Science, v2 (170 pages), free download; Why code written by scientists gets ugly; A Statistician's View on #BigData and Data Science - updated; CIOReview Top 100 Most Promising Big Data Companies.
- Top KDnuggets Tweets, May 14-15 - May 16, 2014.
Facebook Network analysis, visualization is easier with httr from R wizard; Cloudera Live offers a new way start with #Hadoop - No downloads; Watch: Basics of Machine Learning ; BigML Machine Learning platform Spring Release.
- Top KDnuggets tweets, May 12-13 - May 14, 2014.
Guide to Data Science Cheat Sheets; Clever hack: How to analyze Facebook Networks using R; Very useful - Introduction to #SQL for Data Scientists; Planning a late career shift to Analytics /Data Science? Be prepared.
- Top KDnuggets tweets, May 9-11 - May 12, 2014.
Data Mining for Statisticians ; For teachers (and students) of #MachineLearning - Slides for LIONbook; Build a word cloud using R text mining tools - step-by-step; Graph Theory: Key to Understanding #BigData - graphs are not just for Google or eBay.
- Top KDnuggets tweets, May 7-8 - May 9, 2014.
30 Simple Tools for Data and Geo-Visualization: iCharts, Fusion, Modest Maps, ...; Did Target Really Predict a Teen's Pregnancy? The Inside Story; Sense, new Data Science startup, builds a Data Science Platform of the Future; Analytics Experts on #BigData Misconceptions.
- Top KDnuggets tweets, May 5-6 - May 7, 2014.
xkcd looks at Love and Statistics: Why it is important to label your axes; Analytics job applicants - avoid these common mistakes; Stanford Online Courses: Education + Advanced Skills = Awesome Career; Landmark for AI: computer system solves Algebra word problems.
CFP - Calls for Papers
- ECML/PKDD 2015 J0525: ECML/PKDD 2015 Journal track, May 25 submission, due May 25
- DS-2014: 17th Int. Conf. on Discovery Science, due May 26
- EUDM: 8th European Conf. on Data Mining 2014, due Jun 2
- DSAA2014: 2014 Int. Conf. on Data Science and Advanced Analytics , due Jun 5
- BIOKDD'14: 13th Int. Workshop on Data Mining in Bioinformatics, due Jun 8
- ECML/PKDD 2014-PhD: ECML/PKDD 2014 PhD session, due Jun 11
- LMCE 2014 : First Int. Workshop on Learning over Multiple Contexts, due Jun 20
- IDEA: KDD 2014 Workshop on Interactive Data Exploration and Analysis, due Jun 20
- BigCHat: KDD 2014 Workshop on Connected Health at Big Data Era, due Jun 23
- KDDBHI: Big Data Analytic Technology for Bioinformatics and Health Informatics, due Jun 23
- KDIR 2014: Int. Conf. on Knowledge Discovery and Information Retrieval, due Jun 23
- ICDM 2014: IEEE Int. Conf. on Data Mining, due Jun 24
- ECML/PKDD 2015 J0629: ECML/PKDD 2015 Journal track, Jun 29 submission, due Jun 29
- IEEE BigData 2014-I: 2014 IEEE Int. Conf. on Big Data, Industry track, due Jul 1
- IEEE BigData 2014: 2014 IEEE Int. Conf. on Big Data, due Jul 1
- IEEE GrC 2014: The 2014 IEEE Int. Conf. on Granular Computing, due Jul 20
- CI 2014: Climate Informatics Workshop 2014, due Jul 25
- ECML/PKDD 2015 J0727: ECML/PKDD 2015 Journal track, July 27 submission, due Jul 27
- AusDM 2014: 12th Australasian Data Mining Conf. , due Jul 28
- ACM DEV 2014: Int. forum for research in the design and implementation of information and communication technologies , due Aug 1
- DINA 2014: Workshop on Data Integration and Applications, due Aug 1
- TELCO 2014: The 1st Int. Workshop on Telco Data-driven Innovations, due Aug 1
- IClaNov: Incremental Classification, concept drift and Novelty detection, due Aug 1
- ECML/PKDD 2015 J0831: ECML/PKDD 2015 Journal track, Aug 31 submission, due Aug 31
- TIR: Time and Information Retrieval, a special issue of Information Processing and Management, due Sep 8