Caravel: Airbnb’s data exploration platform
For data exploration, discovery, and collaborative analytics, AirBnB have built and open sourced, a data exploration and dashboarding platform named Caravel. It allows data exploration through rich visualizations while performing fast and intuitive “slicing and dicing” of your dataset.
By Maxime Beauchemin, AirBnB.
At Airbnb, we love data, and we like to think that analytics belongs everywhere. For us to be data-driven, we need data to be fluid, fast flowing, and crystal clear.
As a vector for data exploration, discovery, and collaborative analytics, we have built and are now open sourcing, a data exploration and dashboarding platform named Caravel. Caravel allows data exploration through rich visualizations while performing fast and intuitive “slicing and dicing” against just about any dataset.
Data explorers can easily travel through multi-dimensional datasets while creating and sharing “slices”, and assemble them in interactive dashboards.
Data exploration at the speed of thought
“Data visualization is effective because it shifts the balance between perception and cognition to take fuller advantage of the brain’s abilities.” — Stephen Few
It takes very little time, maybe 10 to 30 seconds of delays, to break someone’s cognitive flow. Caravel keeps your thinking loop spinning by providing a fluid query interface and enforces fast query times. Slicing, dicing, drilling down, and pivoting across visualizations allow users to explore multi-dimensional data spaces effectively.
The codeless approach to data navigation allows everyone on board, democratizing access to data. On one side of the spectrum, users that are less technical find an easy interface to query data. On the other end of that spectrum, advanced users enjoy gaining velocity and the ease of sharing the content they create.
Data scientists, engineers and other data wizards can still use Tableau, R, Jupyter, Airpal, Excel, and other means to interact with data, but Caravel is gaining mind share internally as a frictionless and intuitive vehicle for sharing data and ideas.
- A rich and extensible set of visualizations including basic charts as well as sunburst, parallel coordinates, heatmap, force directed layouts, world map, pivot table, word cloud, Sankey diagram, and more!
- Create and share interactive dashboards as collections of visualizations
- Flexible authentication and authorization, with support for LDAP, OpenID, OAuth, Remote User, and more. Granular permissions and role management allow administrators to define very clearly who gets access to which feature and/or which dataset
- A thin semantic layer that defines how datasets should be exposed, and allowing to enrich the content by adding SQL expressions and metrics
- Connectivity to most SQL-speaking databases, as well as support for querying Druid.io for fast realtime analytics
- A smooth learning curve: users can be trained in minutes and get value instantaneously
- Flexible data caching, with cascading timeout parameters by report, table and database to relieve your databases from heavy load and to make important dashboards load quickly
- Customizable and hackable! You can brand and skin Caravel with your own bootstrap theme, create CSS templates for your dashboards and modify the controls for specific visualizations
Caravel should work just as well in your environment as it does in ours. The query layer was written using SQLAlchemy, a SQL toolkit that allows authoring queries that can be translated to most SQL dialects out there.
Beyond the SQL world, Caravel is designed to harness the power of Druid.io. Druid is an open source, fast, column-oriented, realtime, distributed data store. Coupling the two together accelerates analysis cycles by taking delays out of the equation.
A thin semantic layer
Caravel allows you to manage a thin layer to enrich your datasets’ metadata. This simple layer defines how your dataset is exposed to the user and is composed of:
- Descriptions, definitions, and verbose names for your dimensions and metrics that provide context while exploring datasets
- Calculated fields and metrics. For instance, ratios, distinct counts, and anything else that can be expressed through SQL
- Simple parameters that define how fields are exposed in the UI
We’ve made taking Caravel for a test drive very easy. After the simple installation process, you’ll get Caravel loaded with a nice set of dashboards, charts, and datasets that you can explore and interact with. The next logical step is to connect to your local databases and start visualizing them.
A bright future
Caravel started as a hackathon project less than a year ago. While the project is already solid, it’s still young and gaining momentum. Look forward to more interactivity in dashboards, support for a growing number of visualizations, a set of training videos, more social features like tags, comments, usage information, chart annotations, and much more!
We’re planning on releasing the data visualizations and controls exposed in Caravel as reusable React components. This modular approach will make these building blocks available to application developers. At Airbnb, we have many use cases for rich and interactive visualizations as part of of internal applications; for example, our A/B testing framework, anomaly detection framework and user session explorer. It would be great to share the same components across all of these applications.
Join the community and find pointers to resources on Caravel’s Github repository!
Bio: Maxime Beauchemin recently joined Airbnb as a data engineer developing tools to help streamline and automate data-engineering processes. Recently, at Facebook, he developed analytics-as-a-service frameworks around engagement and growth-metrics computation, anomaly detection, and cohort analysis. You can read more about his projects on his blog, Digital Artifacts.