Top Analytics and Big Data trends ahead of Strata Hadoop NYC Conference

Top trends in Analytics and Big Data named by our readers ahead of Strata + Hadoop NYC conference include Deep Learning, Apache Spark, democratization of analytics, and real-time processing of Big Data.

By Gregory Piatetsky, @kdnuggets, Aug 14, 2014.

As a media partner for Strata 2014 Conference, KDnuggets had raffled among our subscribers a free, 2-day conference pass to

Strata Conference + Hadoop World Strata Conference + Hadoop World,
Tools and Techniques that make data Work

Oct 15-17, 2014.
New York, NY, USA

Strata + Hadoop is one of the most important and probably the largest conference on Big Data.

Check "Why You Should Attend" to see why!

Congratulations to Sima Yazdani from Cisco who won the KDnuggets pass!

All other readers can get a 20% discount code on a regular registration with code: KDNG.

As part of the raffle, we have asked our readers to name top 2-3 most important trends in Analytics, Data Science, and Big Data in 2014-2015, and here are the answers.

Most popular trends were
  • Deep Learning
  • Spark, moving beyond MapReduce
  • democratization of analytics, making easier to use and understand
  • Real-time processing of Big Data

Comparing with responses for Top Trends in Analytics and Big Data ahead of Strata 2014 Santa Clara, a similar raffle KDnuggets conducted before Feb 2014 Strata Santa Clara conference, we see only one new topic - "Deep Learning". The other 3 topics - Spark, Democratization of Analytics, Real-time processing were also mentioned in February.

Here are the selected responses.

  • Job market (where to find people with these skills)
  • Text data/semantic analysis
  • The challenge of communicating complex analyses to non-technical clients/partners

  • Deep Learning
  • In Memory Databases
  • Apache Mahout

  • emergence of graph databases and graph-based tools such as Neo4j, GraphX, GraphLab, as a standard way to process and store increasingly diverse data
  • real-time distributed cloud data processing
  • Yarn software for Hadoop 2 and especially, the emergence of Spark features that complement the NoSQL and MySQL in the cloud

  • Summarizing a lot of data to simple indicator (It's similar as Metrics in Lean analytics, and it will be extended to the Big data analytics area)
  • Learning algorithms (nowadays deep learning algorithm is soaring, and more people and business will be interested in them. )
  • more Spark cases

  • Deep Learning
  • Recommendation engine
  • Big Data applied in Healthcare

  • Performance and Speed on aggregating big data
  • an increasing demand in visualization of big data

  • data curation
  • data integration

  • Creating a faster, more agile ways to develop more complex algorithms - the search for deeper meaning.
  • Conducting more advanced analysis on larger data sets.
  • How to make the life of the data scientist easier. Focus on the data, less on the coding.

  • Real Time Analytics
  • Securing big data clusters and components for enterprise
  • Further evolution from Map Reduce

  • to bring the power of complex analysis to everyday employees
  • more and more people will try to make use of machine learning methods

  • Text and data analytics
  • Analyzing related Social activities (stream from many sources yet related to specific topics)
  • Building applications using Semantic Technology stack

  • Consolidating Hadoop clusters with relational and other DBs in "data lakes."
  • Visual data mining continuing to be used more than analytical data mining.

  • Industrializing Analytics: moving the process of predictive modeling from a cottage industry to production line approach using tools with interchangeable components instead code that's custom integrated.
  • The "Hype Curve" for "Big Data": while there is some real promise for using more detailed and differentiated data, IT departments are drowning from trying to deploy analytics, and some of the analytics have turned out to really not be that actionable or impressive, sometimes due to deterministic outcomes that look awesome only because of data or process quality issues or due to operational inefficiencies in legacy system integrations that make getting the "last mile" to real-time almost impossible.

  • Real time processing of data - Real time time streaming + Real time analysis
    There has been a huge requirement for Hadoop to become more real time than before. real time processing involves both real streaming and real time analysis. This means you have to get the data as fast as possible, process the data and have it available for delivery through various analytical and reporting tools
  • Security - A big key component of Hadoop that is starting to materialize is security. Basic security has been available for quite a while with kerberos, but people have asked for better authentication and access controls than before. RBAC and other mechanisms are becoming more important. Audit control and tracking has also gain importance when applying Master Data Management standards to Hadoop

  • Data science will spread to more and more companies.
  • Educational institutions increase their output of Data Scientists.

  • Predictive Analytics
  • Parquet + HBase + HAWQ + Gemfire would replace Spark
  • Real Time Analytics

  • The rising importance of unsupervised approaches to deriving insights from data. Currently, most of the focus in Machine Learning and Data Mining has been around supervised approaches such as classification. The trend is that people want to derive insights to solve business problems when we will not have any (or very little) "labeled data".
  • The practice of data mining is going to move towards specific business problems that may require development of techniques that are not universal. Currently, most data scientists learn a repertoire of well known techniques and seek to cast business problems into terms where these techniques are directly applicable (when you have a hammer, you go looking for nails to pound). This is limited and will not work very well going forward. Rather, we will go back to solving problems in situ and then seek to learn from practice and generalize.