KDnuggets™ News 14:n15, Jun 18

Features (6) | Software (3) | Opinions (6) | News (2) | Webcasts (1) | Courses (1) | Jobs (7) | Academic (1) | Publications (2) | Tweets (3) | CFP (6) | Quote


  • KDnuggets Analytics, Data Mining, Data Science Software Poll - Analyzed - Jun 17, 2014.
    We analyze the results of KDnuggets Software Poll, including correlations between tools, and relationships between commercial, free, and Hadoop/Big Data tools. We identify a potential capability gap. Download anonymized data and analyze it yourself.
  • Cartoon: Big Data and World Cup Football - Jun 17, 2014.
    New KDnuggets Cartoon takes a fresh look on Big Data insights and World Cup 2014 in Soccer. What should a player do when Big Data predicts his behavior?
  • The Cardinal Sin of Data Mining and Data Science: Overfitting - Jun 14, 2014.
    Overfitting leads to public losing trust in research findings, many of which turn out to be false. We examine some famous examples, "the decline effect", Miss America age, and suggest approaches for avoiding overfitting.
  • Profile: KDnuggets Serves Analytics and Big Data Fields - Jun 11, 2014.
    A profile of KDnuggets, including an overview, history, and present highlights, is featured on the homepage of INFORMS, a major society for Analytics and Optimization (until June 23, 2014).
  • INFORMS: CAP® Analytics Certification - Jun 17, 2014.
    Maximize your analytics career with CAP(r) Analytics certification - show that you are well-qualified. Apply free of charge and take CAP exam at over 700 testing centers worldwide.
  • Huge Big Data Poster and Reference - Jun 12, 2014.
    A really Big poster "Do You Know Big Data" includes: What it is, Leading tools, What is a Data Scientist, What questions should we ask of databases, Visual techniques, Statistical algorithms, Privacy, and more.


  • YARN is All the Rage at Hadoop Summit 2014 - Jun 12, 2014.
    Apache YARN, which enables much broader types of computations than MapReduce, is quickly becoming an integral part of Hadoop projects. We review best practices considerations for a YARN cluster.
  • DLib: Library for Machine Learning - Jun 10, 2014.
    DLib is an open source C++ library implementing a variety of machine learning algorithms, including classification, regression, clustering, data transformation, and structured prediction.
  • Request: Apache UIMA Research Partnership in EU - Jun 11, 2014.
    Looking for any EU university department currently working with Apache UIMA developing text analysis software, and interested in research partnership.

Opinions and Interviews


  • Top stories for Jun 8-14 - Jun 15, 2014.
    KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll: RapidMiner Continues To Lead; Data Lakes vs Data Warehouses; The First Law of Data Science: Do Umbrellas Cause Rain? Huge Big Data Poster and Reference.
  • AlgoMost contest: Predicting future company acquisitions - Jun 11, 2014.
    Develop an algorithm to predict which companies are most likely to be acquired in the current fiscal year.

Webcasts and Webinars



  • DST: Big Data Platform Architect - Jun 16, 2014.
    Support Machine Learning Engineers and Data Scientists by implementing analytic and sampling jobs in MapReduce and build new products by building large scale ETL processes and high performance architectures.
  • DST: Machine Learning Scientist - Jun 16, 2014.
    Work closely with Data Scientists to identify and tune analytic algorithms, and work closely with Data Platform Engineers to deploy analytic jobs against Hadoop and other massive data stores.
  • SAP: Senior Data Scientist - Jun 16, 2014.
    Apply machine learning to glean real time intelligence from the largest trading partner network on the planet. Work on internet scale optimization problems and solve complex business problems with a smart and passionate team.
  • CollegeBoard: Research Analyst II - Jun 14, 2014.
    Join our Research division to support statistical analysis, data manipulation, mapping, visualization, and research.
  • AirProducts: LEAD SYSTEMS ENGINEER - Applied Statistics and Data Sciences - Jun 13, 2014.
    Do technical studies involving process/product development or a business decision through the combination of modeling, optimization and computational data-based techniques.
  • Thomson Reuters: Data Scientist (Data Innovation Lab) - Jun 13, 2014.
    Seeking a passionate Scientist as of the founders of the Data Innovation Lab. Looking for a proven record of building data-driven insights to address complex business problems.
  • LivingSocial: Data Scientist - Jun 11, 2014.
    Delivering insightful solutions to complex problems to impact the next generation of products at LivingSocial. Analyzing enormous volumes of data on customers, merchant offers and channels, and developing advanced statistical algorithms to solve unique problems.

Academic/Research positions

  • DePaul School of Computing: Lecturer in Data Science - Jun 17, 2014.
    Ideal candidates will have expertise and teaching experience in data science with an emphasis in computational statistics, data mining, data visualization, pattern recognition or machine learning.


Top Tweets

  • Top KDnuggets tweets, Jun 13-15 - Jun 16, 2014.
    Book: Data Classification: Algorithms and Applications Top 10 Data Analysis Tools for Business #BigData companies to watch selected by top analytics experts The Cardinal Sin of Data Mining and Data Science: Overfitting.
  • Top KDnuggets tweets, Jun 11-12 - Jun 13, 2014.
    Huge Big Data Poster and Reference "Data science" misses half the equation: you also need "decision science" Proposed ethical guidelines for Twitter data mining: clear objectives, protect anonymity Great talk at Google! John Ioannidis on why most published research is wrong.
  • Top KDnuggets tweets, Jun 9-10 - Jun 11, 2014.
    Also - The First Law of Data Science: Do Umbrellas Cause Rain? Tell Your Kids to be Data Scientists - Not Doctors DLib Library for Machine Learning

CFP - Calls for Papers


Researchers too frequently commit the Cardinal sin of Data Mining - Overfitting the data. Gregory Piatetsky, 2014.