Interview: Michael Brodie on Industry Lessons, Knowledge Discovery, and Future Trends

The last part of our exclusive interview focuses on Industry Lessons, Knowledge Discovery, Privacy Issues, Expected Technical Developments in next 5 years and more.

By Gregory Piatetsky, May 6, 2014.

Here are Part 1 and Part 2 of the interview with Michael Brodie. This is the third and final part of the interview.

Michael BrodieDr. Michael L. Brodie has served as Chief Scientist of a Fortune 20 company, an Advisory Board member of leading national and international research organizations, and an invited speaker and lecturer. In his role as Chief Scientist Dr. Brodie has researched and analyzed challenges and opportunities in advanced technology, architecture, and methodologies for Information Technology strategies. He has guided advanced deployments of emergent technologies at industrial scale, most recently Cloud Computing and Big Data. In his Advisory Board roles Dr. Brodie addresses current and emergent strategic challenges and opportunities that are central to the charter and success of the organizations. As an invited speaker Dr. Brodie has presented compelling visions, challenges, and strategies for our emerging Digital Universe in over 100 keynote speeches in over 30 countries and in over 100 books and articles.

GP: Q9. Around 1989 when you were a manager at GTE Labs and I was a member of technical staff there, you were somewhat skeptical of the idea I proposed for research into Knowledge Discovery in Databases (then called KDD or Data Mining, and more recently Predictive Analytics, and Data Science). The field has progressed significantly since then. From your point of view, what are the main successes and disappointments of KDD/Data Mining/Predictive Analytics and can Data Science become an actual science?

Big Data wordleMB: My current research concerns the scientific and philosophical underpinnings of Big Data and data Science. With Big Data we are undergoing a fundamental shift in thinking and in computing. Big Data is a marvelous tool to investigate What – correlations or patterns that suggest that things might have or will occur.

Big Data’s weakness is that it says nothing about Why – causation or why a phenomenon occurred or will occur.

A pernicious aspect of What are the biases that we bring to it. On a personal note, my biased recall of 1989 was how marvelous your ideas were and the amazing potential of data mining. I accept your view that I was skeptical and not as enthusiastic as I recall. You see I modified reality to fit my desire to be on the winning side, which I was not then. Hence, what we think that we think may bear little resemblance to reality or, more precisely other people’s reality. As Richard Feynman said,

“The first principle is that you must not fool yourself - and you are the easiest person to fool.”

That said, I see the main successes of this trend as a nascent trajectory along the lines of Big Data, Data Analytics, Business Intelligence, Data Science, and whatever the current trendy term is. The World of What is phenomenal – machines proposing potential correlations that are beyond our ability to identify. Humans consider seven plus or minus 2 variables at a time, a rather simple model, while models, such as Machine learning, can consider millions or billions of variables at a time. Yet 95% (or even 99.99999%) of the resulting correlations may be meaningless. For example, ~99% of credit card transactions are legitimate with less than 1% that are fraudulent, yet the 1% can kill the profits of a bank. So precision and outlier cases, called anomalies in science can matter. So it pays to search for apparently anomalous behavior – as it is happening!

We have already seen massive benefits of Big Data in the stock market, electoral predictions, marketing success, and many more that underlie the Big Data explosion. Yet there is a potential Big Data Winter ahead if people blindly apply Big Data and more specifically machine learning. The failures concern limited models of phenomena and the human tendency of bias. People can and do use What (Big Data, etc.) to support their biases and limited models, e.g., used to support the claim of the absence of climate change or lack of human impact on climate change, rather than letting the data speak to suggest directions and models that we may never have thought of. As it has always been, it takes courage to change from a discrete world of top-down models [I know how this works!] to an ambiguous, probabilistic world [What possible ways does this work?].

Those are natural successes and limitations of an emerging field. The direction, opportunities, and changes are profound. I experience a mix of fear and tingles thinking of asking the data to speak. Hoping that I can be open to what it says and distinguishing s..t from Shinola.

I call the vision Computing Reality. It may be the Next Generation of Computing.

GP: Q10. In your very insightful report of the White House-MIT Big Data Privacy Workshop, you have a quote “Big data has rendered obsolete the current approach to protecting privacy and civil liberties". Will people get used to much less privacy (as the digitally-savvy younger people seem to be) or will government regulation and/or technology be able to protect privacy? How will this play in US vs. Europe vs. other regions of the world?

MB: As an undergraduate at the University of Toronto, I was extremely fortunate to have had Kelly Gotlieb, the Father of Computing in Canada, as a mentor. I was a student in his 1971 course, Computers and Society, later to become the first book on the topic. Kelly and the issues, including privacy, have resonated with me throughout my career. Kelly observed that privacy, like many other cultural norms, varies over time. So yes, Privacy will fluctuate from Alan Westin’s notion of determining how your personal information is communicated to the Facebook-esk "Get over it".

privacyWhile personal privacy is undergoing significant change, disclosure of information assets that are part of the digital economy or of government or corporate strategy may have very significant impacts on our economy and democracy. Hence, this raises issues of security, protection, and cultural and social issues too complex to be treated here.

However, there are a number of very smart people looking at various aspects. The quote you cite is from Craig Mundy (Privacy Pragmatism: Focus on Data Use, Not Data Collection, Foreign Affairs, March/April 2014) who explores that changes Big Data brings debating the balancing of economic versus privacy issues.

Very smart folks, like Butler Lampson and Mike Stonebraker, are commenting on practical solutions to this age-old problem. Their arguments are along the following lines. Due to the massive scale of Big Data, and what I call Computing reality, previously top-down solutions for security, such as anticipating and preventing security breaches, will simply not scale to Big Data. They must be augmented with new approaches including bottom-up solutions such as Stonebraker’s logging to detect and stem previously unanticipated security breaches and Weitzner’s accountable systems.

To beat the Heartbleed bug and others like it, “Organizations need to be able to detect attackers and issues well after they have made it through their gates, find them, and stop them before damage can occur,” Gazit, a leading cyber security expert said recently. “The only way to achieve such a laser-precision level of detection is through the use of hyper-dimensional big data analytics, deploying it as part of the very core of the defense mechanisms.”

“Big data” has rendered obsolete the current approach to protecting privacy and civil liberties.

Hence, Big Data requires a shift from a focus on top-down methods of controlling data generation and collection to a focus on data usage. Not only do top-down methods not scale, Tightly restricting data collection and retention could rob society of a hugely valuable resource [Craig Mundy, see above). Adequate let alone complete solutions will take years to develop.

GP: Q11. What interesting technical developments you expect in Database and Cloud Technology in the next 5 years?

Expected technical developmentsMB: The Big Picture is called Computing Reality in which we model the world from whatever reasonable perspectives emerge from the data and are appropriate, e.g., have veracity, and make decisions symbiotically with machines and people collaborating to optimize resources while achieving measures of veracity for each result.

One subspace of this world is what we currently know with high levels of confidence, the type of information that we store in relational databases. Another encompassing space is what we know but forgot or don’t want to remember (unknown knowns) and a third is what we speculate but do not know (known unknowns), these are all the hypotheses that we make but do not know in science, business, and life.

The rest of the data space – the unknown unknowns - is infinite, otherwise learning would be at an end. That is the space of discovery.

I am investigating Computing Reality to investigate the entire space with the objective of accelerating Scientific Discovery. This is practically interesting because very little of our world is discrete, bounded, finite, or involves a single version of truth, yet that is the world of most computing. With Computing Reality we hope to be far more pragmatic and realistic. This is technically and theoretically interesting because we have almost no mathematical or computing models in these areas. Those that exist are just emerging or massively complex. How cool is that? You see what old retired guys get to do?  

GP: Q12. What do you like to do in your free time? What recent book you liked?

MB: Free time – what a concept! My yoga teacher, Lynne recommended that I should try to do nothing one day, and I will. I will. Soon. Really. Life is such a blast; it’s hard to keep still.

My activities include the gym (4 times a week); hiking/climbing Michael Brodie with son at New Hampshire
(Michael Brodie and his son on a peak in New Hampshire)

~75 mountains USA, Nepal, Greece, Italy, France, Switzerland, and even Australia; 42 of the 48 4,000 footers in NH (most with Mike Stonebraker); cooking (daily and special occasions with my son Justin, an amazing chef and brewer, when he’s not doing his PhD), travel, and my garden; all of these – except the gym and garden - with family and close friends.


Very cool Big Data Books

    The Signal and the Noise: Why So Many Predictions Fail-but Some Don't, by     Nate Silver, Penguin Press

    Big Data: A Revolution That Will Transform How We Live, Work, and Think     by Viktor Mayer-Schonberger, Kenneth Cukier, Houghton Mifflin Harcourt

Real books

    Ken Follett’s The Pillars of the Earth; Century Trilogy (Fall of Giants, Winter     of the World and Edge of Eternity)

    Henning Mankell’s The Fifth Woman (A Kurt Wallander Mystery)

GP: You just returned from Doha, Qatar where you were advising the Qatar Computing Research Institute (QCRI) - quite far from Silicon Valley, New York, or Boston. What is happening there and what computing research are they doing?

MB: This was my first visit to Qatar that was remarkable culturally an intellectually. Culturally I saw spectacular result of hydrocarbon wealth and vision, e.g., amazing architecture emerging from the dessert. Intellectually I saw the beginnings of Qatar’s National Vision 2030 to transform Qatar’s economy from hydrocarbon-based to knowledge-based.

One step in this direction by the Qatar Foundation was to create the QCRI Qatar Computing Research Institute(QCRI). In less than three years QCRI has established the beginnings of a world-class computer science research group seeded with world-class researchers in strategically important areas such as Social Computing, Data Analysis, Cyber Security, and Arabic Language Technologies (e.g., Machine Learning and Translation) amongst others. Each group already has multiple publications over several years in the leading conferences in their areas, e.g., SIGMOD and VLDB for Data Analysis. I spent my time reviewing what I consider to be some of the most challenging issues in Big Data.