Interview: Peter Alvaro, UC Berkeley, on Managing Asynchrony and Partial Failure

We discuss the challenges in simultaneously managing asynchrony and partial failure, the problem of composition, research motivation, trends and more.

Peter AlvaroPeter Alvaro is a doctorate candidate at the University of California, Berkeley, where he is working with Joe Hellerstein and the BOOM team. His research interests lie at the intersection of databases, distributed systems and programming languages: in particular, he wants to know how the lessons from the first may be incorporated into the third, and to what degree this mitigates the difficulties of the second. To this end, his team is designing a collection of declarative languages for specifying and implementing complex distributed systems in a data-centric manner. Before joining the PhD program, Peter worked at

Recently, Peter delivered a keynote at RICON 2014 on Outwards from the Middle of the Maze: Using Lineage to find Bugs at Scale (Video, Slides).

First part of interview.

Here is second and last part of my interview with him:

Anmol Rajpurohit: Q5. Why is it so hard to simultaneously tackle the problems of asynchrony and partial failure? What are your recommendations on managing them?

Peter Alvaro: Let’s look at a really simple example (in an arguably unrealistic environment). Consider two processes A and B; A sends a stream of 10 messages to B, and B computes some function over that stream, producing some output. If we assume reliable, ordered delivery, the programmer of the system needs only to consider the correctness of the function with respect to a single execution. Now assume that messages can be lost or reordered (with just two endpoints, we can easily prevent this in practice using TCP, but we want our approach to generalize to unconstrained topologies). How many executions does the programmer need to consider to ensure that B produces the correct outcome? A quick back-of-the-envelope calculation tells us that in the absence of semantic knowledge of the function, even if no messages are lost we need to consider 10! (10 factorial, or ~3.6M) delivery order schedules -- does B produce a correct outcome for all of them? Of course, it is worse than that: we need to consider every possible failure pattern. There are 210 = 1024 of them -- for each, we need to consider every delivery order!
Asynchrony Partial Failure
Common sense tells us that for most programs, many of these executions are not “interestingly” different. For example, if our function is commutative, we do not need to consider the reorderings at all. But how do we know if our program is commutative? If it is not, what will it cost us to rule out the nondeterminism in delivery order (in this toy example, just TCP, but for arbitrary topologies, consensus)? And even if we know the function is commutative, what about those 1024 possible executions under failure? Does A always reliably retry when messages are lost? Are these retries always safe -- that is, is the function applied at B idempotent? These are very difficult questions that distributed programmer must answer over and over again.

As I argued in the talk, these questions are too difficult to answer on our own: we need tool support! Blazes and LDFI attempt to answer these questions in a general way.

AR: Q6. What do you mean by "Composition is the last hard problem"?

PA: At a high level, a key conceit of the talk was that the NoSQL community has, as Eric Brewer argued in his 2012 RICON keynote, rejected the “top-down,” user-centric guarantees of transactional databases in favor of the “bottom-up,” developer-centric ethos of operating systems (simple, reusable components). That community has had impressive successes delivering reusable, distributed data management components. But it has also sacrificed something immensely valuable: a contract with the application programmer to hide system-specific failure modes.
Composition Problem
What can we do to restore that contract? Part of the solution will involve shifting our focus from that *state* of components to the *data in motion* through components: that is, how components change data, not how it changes them.

AR: Q7. How and when did you got the motivation to work on the interaction of programming languages and databases? What motivated you to go back to school after working at

PA: At, we were maintaining a large-scale data warehouse on Oracle RAC, and stretching the limits of the technology. It was taking too Ask Logolong for website impression and click logs to flow from a massively partitioned set of logging servers through ETL, summarization and presentation. I was working on two projects in parallel: a “real-time” in-memory database that received multicast logs directly from the site middle tier and evaluated continuous queries against them, and a massively-parallel distributed query engine that distributed ad-hoc aggregation queries over the collection of logging servers, harnessing their computational power. Both were painstakingly implemented in C.

It began to dawn on me that the systems I was implementing (essentially a massively-parallel query framework and a streaming database) were both essentially engines that converted declarative queries into one-shot, “bespoke” distributed systems. In fact, most of the distributed systems that we implement, use and maintain are just queries -- they describe, at a high level, how data changes as it flows through a network over time. Doesn’t a static webserver just implement a join between a table of pages and a stream of requests? Doesn’t a dynamic webserver do the same thing, but also join a stream of parameters and perhaps call a user-defined function on the result? And looking deeper, aren’t protocols queries too? They describe the messages a process sends as a view over the stream of messages it has received.

When I came to berkeley (see my answer to #3) I was delighted to find that Professor Hellerstein’s group was already working on this problem.

AR: Q8. What other research problems are you interested in working on (may be, after your PhD)?

PA: There is so much important work to be done. Here are just a few things that I can’t wait to get started on.

DebuggersCurrent debuggers, like a majority of programming languages, still reflect a sequential, single-site model of computation. Just as I explored disorderly programming as a way of raising the level of abstraction in the *implementation* of distributed systems, I’d like to dive into disorderly debugging. I am not sure yet what the ideal disorderly debugger looks like, but I am sure it doesn’t look anything like GDB. The lineage diagrams produced to
“explain” bad outcomes in LDFI are a step in the right direction.

I spent the first few years of my PhD on language design and implementation, and I think the time is ripe for another round of that. The right language for a particular domain involves hiding the details that don’t matter and bringing into focus the details that do. The early generation of declarative networking languages postulated that the detail that mattered was *data*. Dedalus showed that *time* matters, and programmers should think hard about it! Blazes’ analysis indicates that time only matters when computations are sensitive to *order*.

Tools like Blazes and LDFI were created to make it easier for programmers to build large-scale data-intensive systems. There are many more such systems to build: I can’t wait to get started on the next one.

AR: Q9. Which of the current trends in Cloud and Big Data seem the most interesting to you?

PA: The future will hold lots of changes in the memory hierarchy for which we will need to adjust, but one thing seems pretty obvious: distances can always increase, while the speed of light remains constant. It will keep getting faster to access data that is “close” -- local data will be “closer” than ever. But data is likely to continue to be more distributed (due both to efforts to exploit parallelism and to ensure reliability), and distance matters. Cloud Big Data TrendsKeeping up with this trend -- exploiting the close data, keeping the pipes and processors full -- will involve doing a better job not of avoiding communication (which will become increasingly necessary) but of avoiding situations in which *local* computation needs to wait for that communication. Minimizing how much distributed systems wait must begin with understanding exactly *when* and *why* they wait. Bloom and Blazes provide a framework for reasoning about coordination requirements using program analysis. These sorts of “coordination analyses” are compelling because they are invariant to scale, which could range from the distance between processor cache levels to the distance between planets. Pale Fire

AR: Q10. What was the last book that you read and liked? What do you like to do when you are not working?

PA: Pale Fire by Nabokov, and Henderson the Rain King by Bellow. Alas, I don’t have as much time for pleasure reading as I once did. In fact, I used to spend nearly all of my time reading: I got my Bachelor’s degree in English Literature -- hence all of the references to Joyce and Stoppard in my talk.

When I am not working I am spending time with my wife Severine and my daughter Beatrice, who is starting kindergarten next year.