Beautiful Python Visualizations: An Interview with Bryan Van de Ven, Bokeh Core Developer
Read this insightful interview with Bokeh's core developer, Bryan Van de Ven, and gain an understanding of what Bokeh is, when and why you should use it, and what makes Bryan a great fit for helming this project.
The core developer of Bokeh was kind enough to give us some of his time recently in order to shed some additional light on the project he helms for our readers. I won't spoil anything by shoehorning any summarized info here; instead, read on to get some insight into both Bokeh and Bryan Van de Ven.
You can find the Bokeh project here. Bryan's Twitter can be found here, and his LinkedIn is here.
Matthew Mayo: Hi Bryan. Thanks for taking some time to speak with KDnuggets. How about you start by introducing yourself to our audience.
Bryan Van de Ven: Sure, my name is Bryan Van de Ven. I currently work for Continuum Analytics, where I have been since it was founded in 2012. I am grateful for the opportunity it has provided to contribute to OSS projects such as Conda and Bokeh. There's a lot of (justified) talk about the "sustainability problem" in Open Source and I'd like to think we are helping to explore robust ways of providing tangible and meaningful support to OSS development.
I have read that, aside from Bokeh (which we will get to momentarily), you have previously worked on other data visualization projects, notably Chaco. What attracted you to this space?
As individuals we have our own ways of perceiving, organizing, and thinking about things. As long as I can recall, I've personally been a somewhat visually and spatially oriented person. When I am navigating somewhere, I have what I'd call a "Google Map in my head" and it was surprising for me to eventually learn that not everyone "sees" this kind of information in the same way I do. But as a result, for example, the best way for me to put different pieces of a complex system in relation to one another and understand those relations is often a simple notional drawing. Then I can think about the relations as a map, which comes more easily to me.
So I think the pump was primed simply by virtue of this visually oriented thought process. I'm not sure if it's cause or effect (or both) but as a kid I loved playing with Logo and Fractint, and as an adult I ended up in professional roles that involved creating various kinds of quantitative visual displays. Somewhere along the way, some also folks introduced me to Tufte and Cleveland and Bertin and The Grammar of Graphics. All of that pretty much cemented my interest in developing effective visualization tools.
Let's talk about Bokeh. How about the one sentence description first?
How would you characterize Bokeh as it relates to other similar projects in the Python stack? For example, would it be correct to call Bokeh a replacement for Matplotlib? An upgrade? More comparable to Seaborn? Something different altogether?
I think "different" is the right word. As an example: until recently Bokeh lacked PNG and SVG export capabilities, which made MPL the go-to for academic publishing needs. Bokeh has recently added PNG and SVG exports, and I think they will cover many use-cases. But there are still probably instances where MPL is a better choice, if you have very precise or specific needs. Beyond that I think it's mostly a matter of taste and which sort of API a person prefers.
Bokeh's "plotting" API is lower level than Seaborn. HoloViews is a recent project for high-level exploratory data analysis in Python, that can also generate beautiful visualizations using Bokeh. HoloViews is definitely the "officially endorsed" very-high level API on top of Bokeh (to replace the old bokeh.charts) and so I think the combination of Bokeh+HoloViews directly comparable to Seaborn in terms of capability. But again, I think the best word is "different". HoloViews approach is extremely declarative, and whether a person likes or prefers this style is mostly a matter of taste.
This is Bokeh (Source).
What does Bokeh do better than other similar projects?
Basically what I think Bokeh does best is to allow people to create sophisticated data visualizations in browsers while staying where they are already comfortable and productive (i.e., Python or R).
So I think if you are looking to create interactive visualizations in web browsers, including in Jupyter notebooks, then Bokeh (or Bokeh+Holoviews) is a compelling choice. If you're looking to connect the incredible constellation of PyData tools (e.g. NumPy, SciPy, Pandas, sklearn, etc) to scalable and deployable web "data apps" with a minimum of code even less mucking with "web tech" then I think Bokeh is the clear choice.
Are there many other developers contributing to Bokeh? Is there opportunity for others to get involved?
The number of "dedicated core devs", i.e. people funded by Continuum or other sources to work directly on Bokeh, goes up and down with time. Right now there are 2-3 people spending a majority of their time on Bokeh, but it's been as high as 7 or 8 in the past.
However, we have tried to make Bokeh extremely welcoming to new contributors and I happy to report that more than a few people have tweeted "I just made my first ever OSS contribution and it was to Bokeh!" At present GitHub lists 242 total contributors to Bokeh. Many of those are one-time or small PRs, but those are actually extremely valuable. The effort (or distraction from other tasks) involved in creating a new PR yourself vs reviewing a PR someone else makes can be pretty substantial, so any PR helps lighten the load of the core devs and is always appreciated.
That said, we'd love to have more people become involved with Bokeh in substantial or long term ways. Part of the onus for making that happen is on us, and we've recently been working to open up our core dev discussions better by moving them to a public gitter chat channel. So if anyone is interested in getting more involved, please come by gitter.im/bokeh/bokeh-dev and give us a holler.
You instruct a DataCamp course on data visualization with Python. Can you tell us a bit about that? What can people expect from it?
The DataCamp course covers the basics of using the bokeh.plotting API, a selection of more advanced topics, and most importantly, practice creating Bokeh server applications. I think the course is a great way for people to get into using Bokeh, especially if lecture presentation plus exercises matches your learning style. In any case I think it's a good jumping off point that can help anyone get better oriented to the Bokeh community: where to go for further questions, some context to help use the documentation better, etc.
What is one piece of overlooked advice you would give individuals getting into a "data science" -- or related -- career?
I'm definitely *not* the right person to dispense advice in this area, because I am definitely not a scientist, data or otherwise. I think it's really valuable to introspect your own strengths and weaknesses realistically, and although I ended up in grad school for physics for a time, ultimately I am most productive as a tinker and a tool maker, not an explorer. So my meta-advice is to find and follow great voices like Hillary Mason, John Myles White, Lorena Barba and others, and see what they have to say.
Do you have any last words to share with our audience?
Just a (hopefully) handy list of resources people can turn to!
Example Apps: https://demo.bokehplots.com
Mailing List: https://groups.google.com/a/continuum.io/forum/#!forum/bokeh
Gitter Chat: https://gitter.im/bokeh/bokeh
Thanks again for taking the time to speak with us, Bryan!