Favorite 2015 Schmarzo Big Data Blogs

A top Big Data influencer lists, outlines, and summarizes his favorite blog posts of 2015. Gain some additional insight into various data science topics with some of these great entries.



#6Why Do I Need A Data Lake? The data lake is a powerful big data architecture that leverages the economics of big data to enable storage, management and analysis of data as compared to traditional data warehouse technologies. The key to maximizing the value of your big data initiatives is the analytics hub and spoke service architecture.

Data Lake

The hub of the architecture is the data lake:

  • Centralized, singular, schema-less data store with raw data
  • Mechanism for rapid ingestion of data with appropriate latency
  • Ability to map data across sources and provide visibility and security to users
  • Catalog to find and retrieve data
  • Costing model of centralized service
  • Ability to manage security, permissions and data masking
  • Supports self-provisioning

The spokes of the architecture are the analytic use cases:

  • Ability to perform analytics (data scientist)
  • Analytics sandbox (HDFS, Hadoop, Spark, Hive, HBase)
  • Data engineering tools (Elastic Search, MapReduce, YARN, HAWQ, SQL)
  • Analytical tools (SAS, R, Mahout, MADlib, H2O)
  • Visualization tools (Tableau, DataRPM, ggplot2)
  • Ability to exploit analytics (application development)
  • 3rd platform application (mobile app development, web site app development)
  • Analytics exposed as services to applications (API’s)
  • Integrate in-memory and/or in-database scoring and recommendations into business process and operational systems

#5In Big Data, Are You Using Refrigerators or Stoves? This blog really challenges how organizations are positioning and selling big data. Too many “experts” are over-emphasizing the big data technology aspects and ignoring the really hard work – understanding what business opportunities exist and how the organization is trying to address them with data and analytics.

My University of San Francisco MBA class finished their Big Data MBA course. We used our trusty “thinking like a data scientist” process to teach our students how to identify a business opportunity, and then use the “thinking like a data scientist” process to drive cross-organizational collaboration to come up with ideas that they can turn into actions using data and analytics.

My co-teacher, the ever talented and energetic Professor Mouwafac Sidaoui, and I asked our students: “What employer wouldn’t want an employee who can excel at doing that?”

#4Big Data Fails: How to Avoid Them. This isn’t exactly a blog. This is an interview that I had with Jessica Davis (InformationWeek) that nicely summarizes many of the keys to big data success. The article actually makes me sound smart (and that’s no small task!). To quote the article:

The companies that run into the most trouble [with big data] are those in which data is in silos, and the thinking about that data is also in silos. For instance, in a banking company there may be a checking account silo and a mortgage silo, and the owners of each group aren’t accustomed to thinking about the whole customer who consumes both services.

Companies that can get past that limitation in their thinking are more likely to be successful with their big data initiatives.

And that example also shows an important factor in successful big data initiatives – collaboration among groups who may not normally collaborate with each other. It relies on team members with different areas of expertise working well together.

The places where we are seeing success is where the business people and the IT people like each other.

#3Creativity Is A Team Activity in Big Data. This could have very easily been my favorite blog. It certainly turned out to be one of my most popular blogs.

The potential of big data is only limited by the creative thinking of your business stakeholders. Maybe the biggest inhibitor to creative thinking is the baggage about data and analytics that we have picked up over the years. Organizations need to embrace the power of “thinking differently,” especially with respect to:

  • Data as a strategic asset to be gathered, enriched and shared, versus data as a cost to be minimized
  • The potential of predictive (what is likely to happen) and prescriptive (what should I do) questions versus of just mechanically capturing descriptive (what happened) questions
  • The power of data science to quantify those variables and metrics that are better predictors of performance, versus business intelligence that just reports on what happened while monitoring current business performance
  • Building analytic profiles at the individual (human, machine) level to uncover individual behaviors, tendencies, propensities, interests, passions, associations and affiliations that can lead to specific actionable insights, versus relying on aggregated data to uncover general market trends

Evolution of the analytic question

#2 – Thinking Like A Data Scientist series. This is probably unfair because this was a four-blog series, but this is my favorite blog(s) from 2015. The series included:

The 8-step “Thinking Like A Data Scientist” process is an enabler for organizations that want to get the most of both their data…and their people. It drives organizational alignment around an organization’s key business initiatives and uncovers where and how big data and data science can optimize key business processes, uncover new monetization opportunities and deliver a more compelling customer experience.

#1Big Data MBA Textbook: Driving Business Strategies with Data Science. Clearly #1 for me was the release of my second Big Data book. This book was written as a textbook to use as part of the class I teach at the University of San Francisco School of Management, but I hope that others can use this textbook to advance big data and data science as business disciplines for tomorrow’s business leaders.

I hope that 2016 is as productive, and given the number of Big Data Vision Workshops that I have to facilitate, I bet it will be!

Bio: William (Bill) Schmarzo, the "Dean of Big Data," is the CTO of EIM Service Line at EMC. An avid blogger, Bill speaks frequently on the use and application of big data and advanced analytics to drive an organization’s key business initiatives.

Original. Reposted with permission.

Related: