Interview: Michael Brodie – We Can’t Rely on Machines

Michael Brodie, a leading database researcher, is convinced that Big Data has more potential than the hype suggests, but also more risks.



Some people are warning that Big Data threats humanity. What is your opinion?

There is an organization called “Future of Life” (http://futureoflife.org). It has just recently been created by very famous scientists and entrepreneurs with a strong commitment to technology like Stephen Hawking, the famous physicist- from  Cambridge, Bill Gates from Microsoft and Elon Musk, the CEO of Tesla. Their vision is to lim-it the risks of automation and Big Data so that they won’t neg-atively impact humanity. The media have dramatized that by stating that these people were saying “Artificial Intelligence may end life as we know it on the planet”. But of course they didn’t say that. Their objective is to safeguard life and develop optimistic visions of the future in order to mitigate exis-tential risks facing human-ity from Artifcial Intelligence.  The nature of the threat is rather that we in artificial intelligence don’t really understand what it does. We see the outcomes and they seem in many cases to be quite positive. But the most advanced research institutes like those- at MIT, when they talk about automating thinking they are talking about automat-ing relatively simple human activities. Sure, there is a lot of success, absolutely. But it is like climbing a mountain to get to the moon. When you reach the top of the mountain you see you are closer to the moon but you need to find a new way.

So there is nothing to be afraid of?

Those of us working in Big Data, should think about how we can improve the things that we do. Do we know that the results we get are correct or complete- or efficient? I’m concerned most about correctness and complete-ness. How do we understand what a machine really does if we can only keep less than ten ideas in our head at one time? A machine can handle billions of variables. I was at the White House last month, where one of the predominant policies of the US government and of 45 other governments around the world is Big Data. How can it be a government policy in England, in America, when we really don’t understand it? If machines and algorithms are making important decisions like running trains or airplanes, choosing medicines to prescribe for patients, do we understand the potential for bad behaviour? At the moment in the Big Data analysis field the vast majority of the practitioners and consumers don’t even realize the nature of the threat. But I, like many people in artificial intelligence, understand the threat and believe that it can be managed.

How can we manage it?

So far we haven’t seen a big focus on addressing risks. Errors haven’t been in areas that are very important. For example Big Data and Big Data Analytics have been used in market-ing and language translation a lot. If you use Google on a daily basis, you notice advertisements that refer to something you searched for previously. If Google serves you the wrong ad that’s not going to change the world. If Google Translate mis-interprets a phrase it may annoy the customer but no specific harm may come. But there- are more significant actions that might threaten an individual or a company.

What are you thinking of?

For example, what if you get an automated report that says “a company is doing very well, you should invest in it” and you do but then lose your money? If “Algorithmic Responsibility” were a reality you could probably take legal actions against whoever sold you the report, whether a machine generated it or not. Currently, that may not be possible. A more severe case would be if a medical treatment plan produced by automated person-alized medicine caused harm rather than results in a cure. That is why such results are considered as advisory to doctors.

Many people think numbers don’t lie and that algorithms are neutral. But they aren’t. It depends on the kind of data you use and how you do the modeling.

The simplest way to character-ize the significance of algorithms is by noting that the risks this poses have moved to the international legal community, and specifically to the US legal community to propose- a set of laws that’s called “Algorithmic Responsibility”. This  means: I don’t care how you came up to the decision you’re providing me, whether you used a machine- or a human or both – you’re still liable for the answer. For example, when a self-driving car has an accident, who is liable? Our society now realizes that as it becomes more data-driven we can’t rely on machines. It’s the human who has to take the responsibility. In my many years of experience in this area I have seldom seen an application where the machine can uniformly produce a concrete answer that the human completely accepts. We need a balance of human and machine- intelligence. Let me conclude- with a warning: Big Data is an example of the increas-ing use of algorithms in our lives. They are used to make decisions about what products we buy, the jobs we get, the people we meet, the loans we get, and much more. Algorithms are merely code written mostly by people but with machine learn-ing and Big Data increasing-ly by machines. Do algorithms discriminate? Cynthia Dwork, a researcher at Microsoft is the voice of concern that algorithms learn to discrim-inate and questions who is responsible when they do, and what are the trade-offs between fairness and privacy?

Others see the benefits, especially the potential impact on the quality of life and health care. How can Big Data help in personalized medicine?

Now that we are beginning to collect data in a massive and consistent way we can tell more and more about what really happened to patients: what drugs they took, what the impact of the drug was and so on. That’s on the very individual personal level. However, there is even a much greater possibility: If we collect detailed medical data like DNA, medical procedures and their outcomes, prescriptions by doctors and so forth from millions of people, we not only get the information of one patient – let’s call him Fred. But within the millions you can prob-ably find thousands of people that are similar to Fred. That’s called population health. And if they had a disease that Fred now has, you can look at their behaviour and make recommendations like: “Fred, you should probably do this because people like you have been successful with this in the past.” Without personalized medicine doctors say: “In gener-al this treatment has had this outcome on the general population.” But that’s not Fred. And Fred has diabetes, he is 46 and he has only one leg and so he may have a very different prognosis from all the rest of the people who take that drug. The ideal or ethical outcome of personal-ized medicine is to improve- the health care of people by prescrib-ing better treatments for them. So that’s the good side of the US government’s Precision-Medicine Initiative “Delivering the right treatments at the right time to the right person” for individuals. The side that the gov-ernment certainly sees is, each of the four leading chronical dis-eases in America costs us approximately 200 billion dollars a year. So if you can increase the health of those patients you can reduce these costs dramatically, at least by half.

Are the big pharma industries really interested in personalized medicine? Aren’t they
more interested in big profits?

There is a disruption coming in pharmaceuticals: their custom-er base is changing from mass markets for big ticket drugs that are increasingly saturat-ed to more focused markets using Big Data. I have advised some big pharma companies, some here in Switzerland. They are going into micro-markets which could be a new source of income. But it would mean changing manufacturing, test-ing or clinical trials- – a lot of things would change. But Big Data can also help them to discover new drugs in ways that are dramat-ically more effective, faster in turnaround and cheaper to produce. An example: The IBM Watson Program was used by Baylor Medicine. They discovered two potential cancer drugs. These drugs must still undergo clin-ical trials. But in months, rath-er than years, they found two drugs to stimulate what they call kineses that might cure cancer. Typically those kinds of discoveries take five to ten years and cost billions- of dollars.

The ZHAW Datalab: The Data Science Laboratory (Datalab) at ZHAW is an interdisciplinary platform of five institutes and centres to transform deep data science know-how into innovative research projects and vibrant teaching in Switzerland.

Recently at the “Second Swiss Conference on Data Science” with 190 participants organized by ZHAW, Jean Marc Piveteau, President of ZHAW, acknowledged its role: “The Datalab is an important player in Switzerland at the interface of applied research and innovation. Data Science is a primary field for Universities of Applied Sciences like ZHAW”, he said. The Mission of ZHAW is the transfer of knowledge into applications, to support the innovation process and the success of new technologies on the market.  “In line with our strategy, in line with our mission, Datalab is present in the field of Data Science – a separate discipline but interdisciplinary”.

Further information www.zhaw.ch/datalab