KDnuggets Home » News » 2014 » Jul » Opinions, Interviews, Reports » Interview: Leo Meyerovich, Graphistry on Browser-based Interactive Big Data Visualization ( 14:n35 )

Interview: Leo Meyerovich, Graphistry on Browser-based Interactive Big Data Visualization

We discuss the merits of Superconductor architecture, comparison with current JavaScript visualization library, use cases, future plans, launch of Graphistry, visualization trends, and more.

Leo MeyerovichLeo Meyerovich co-founded Graphistry in early 2014. Previously, he researched programming language design at UC Berkeley and Brown University. His PhD introduced the first multicore web browser (3 PLDI SRC awards) and led to browser parallelization at Mozilla, Samsung, Google, Microsoft Research, and Qualcomm. Leo also performed the largest scale analysis of programming language adoption and social underpinnings (OOPSLA best paper) and, with security researchers at Google, Microsoft, and Brown University, designed several secure web scripting languages.

Earlier, he designed Flapjax, the first functional reactive language for highly concurrent web software (OOPSLA best paper). His research was supported by the first Qualcomm Innovation Fellowship (winner among 50 Ph.D. teams at Berkeley and Stanford), the NSF GRFP, and grants from Samsung, Nokia, Microsoft, NVIDIA, Intel, and others.

Here is my interview with him:

Anmol Rajpurohit: Q1. What is "Superconductor"? What are the key ideas behind Superconductor?

SuperconductorLeo Meyerovich: Superconductor is our language from UC Berkeley for visualizing big data sets that runs in the browser.

The problem is that software is moving to the web, but it’s hard to scale interactive JavaScript visualizations beyond a few thousand data points. We needed a browser-friendly way that easily handle magnitudes more data.

You load it in as just another JavaScript library, write layouts in its CSS-like language, and script interactions with normal JavaScript.  Underneath, Superconductor takes care of generating high-performance code that leverages multicore and GPU hardware.

AR: Q2. What are the key aspects of Superconductor architecture that give it such unique capabilities?

LM: Superconductor makes several architectural leaps over JavaScript libraries like D3 and what browsers try to do natively:

  • Parallel JavaScript. Superconductor uses new JavaScript APIs that expose multicore  and GPU hardware: web workers, WebGL, and WebCL. Browser internals barely take advantage of parallel hardware and JavaScript libraries like ThreeJS only do it for the last computation step of a visualization, rendering. Superconductor optimizes the entire pipeline.
  • Optimizing Compiler. Superconductor’s sequential performance is closer to hand-optimized C than naïve JavaScript, and parallel processors blaze right through it. Consider iOS : one of its scaling bottlenecks is the constraint solver  that ships  alongside a visualization to compute the runtime layout. Superconductor’s compile-time optimizer instead specializes the visualization’s layout solver down to a few map/reduce calls. Furthermore, that emitted code looks more like tight Fortran array loops than arbitrary JavaScript.
  • CSS Extensions. CSS is too inexpressive for describing visualizations like tree maps, yet writing that code in JavaScript challenges effective automatic parallelization.  Our solution was to extend CSS with constraints and rendering commands. We realized that attribute grammars  describe this extended model of computation, and built a new kind of schedule synthesizer that automatically schedules layout code into a sequence of GPU-friendly map and reduce calls. Instead of asking users to modify their browser’s CSS engine, we run it all in parallel JavaScript.

AR: Q3. What are the best use cases of Superconductor that you have seen so far? What other kind of business problems will find great value in Superconductor?

LM: At Graphistry, we are building tools for exploring time series and graph data. Our initial focus is monitoring and analyzing data from hardware sensors and software services.

One of my favorite early experiments was for exploring election fraud. A tree map showed 100,000+ polling stations Geneticsand sliders let you filter on demographics like voter turnout, which is a key fraud indicator. We knew there was suspicious activity, but this revealed where in the pipeline. We’ve had a lot of excitement for opportunities in genetics (especially heatmaps and circos plots) and finance (more fraud heatmaps, large correlation matrices, and  a lot of time series).

AR: Q4. What are the future plans of Superconductor team? When do you expect Superconductor to be available to public?

LM: The WebCL+WebGL backend is already available under the permissive BSD3 open source license. We want to release the web workers + WebGL backend later this year. The technology is important enough that we’re started Graphistry as a company to support and accelerate even more aggressive next steps.

AR: Q5. How and when did you get inspired to launch Graphistry.com? What are the next plans?

LM: We’ve been working in this space at Berkeley for 6 years, Graphistry Graphand started Graphistry, Inc.  a few months ago to support making some even bigger leaps. I can’t say much, but Superconductor is going to the cloud. The result is more flexibility, running even on tiny devices, and scaling beyond today’s already impressive ~1 million data points to 10-100 million.

I invite people needing help with their graph and time series data to join our private beta. This year’s focus is on enabling internal dashboard users to get significantly better visibility. Year-end, we will tackle web-scale public deployments.

AR: Q6. What current trends in the Data Visualization space are of the most interest to you?

LM: Visual analytics is scaling from tiny charts to deep exploration. With Superconductor, we’ve made leaps in getting the raw performance Data Visualizationneeded for that. However, as part of our work at Graphistry, a full solution requires scaling the visual and interactive design. We see a lot of eye candy that plots big data sets directly on to maps, but in practice, that mostly gives a population heatmap. In contrast, smarter designs like force-directed layouts would reveal clusters, and putting graph mining algorithms at the fingertips of users enables exploring even more nuanced relationships.

AR: Q7. What motivated you to work on Visualization problems? What is the best advice you have got in your career?

LM: I switched from designing interactive media to building programming languages as a way to take the pain out of going from idea to code. I repeatedly hit that same roadblock as a scientist exploring quantitative data using Python, R, and Excel. When we realized that my Ph.D. work on parallelizing web browsers could be applied to scaling data exploration, I leapt at the chance.

The transcript of You and Your Research
by mathematician Richard Hamming and computer science pioneer Richard Hamming is spot on. He advises picking the most interesting problem that you can. The trick is that, to be interesting, a problem has to be both important to society and tractable for yourself. To succeed, surround yourself with amazing people whom you will learn from and
collaborate with. The transcript is worth reading.

AR: Q8. What advice would you give to data science students and researchers who are just starting to work in this area?

LM: Collaborate with industrial teams already doing it on real data sets, at true scale, and with big goals in mind. For data visualization in particular, stay up to date with people pushing the field like Jeff Heer and Carlos Scheidegger.

AR: Q9. What was the last book that you read and liked? What do you like to do when you are not working?

Diffusion of InnovationLM: Not the last book, but definitely the one I’d recommend for anyone building high-impact technologies: Diffusion of Innovation by Everett Rogers.

Running a company is intense, so I have to make time to read, write, and code. The Bay Area is wonderful if you love food and running.