TabPy: Combining Python and Tableau
This article demonstrates how to get started using Python in Tableau.
By Bima Putra Pratama, Data Scientist
Can we integrate the power of Python calculation with a Tableau?
That question was encourage me to start exploring the possibility of using Python calculation in Tableau, and I ended up with a TabPy.
So, What is TabPy? How can we use TabPy to integrating Python and Tableau?
In this article, I will introduce TabPy and go through an example of how we can use it.
TabPy is an Analytics Extension from Tableau which enables us as a user to execute Python scripts and saved functions using Tableau. Using TabPy, Tableau can run Python script on the fly and display the results as a Visualization. Users can control data being sent to TabPy by interacting in their Tableau worksheet, dashboard, or stories using parameters.
You can read more about TabPy in the official Github Repository:
Execute Python code on the fly and display results in Tableau visualizations: — tableau/TabPy
I assume you already have Python installed in your system. If you don’t, you can install it first by going to https://www.python.org/ to download the python installation. Then you can install it in your system.
Next, we can install TabPy as a python package by using
pip install tabpy
Once the installation success, we can run the services using the following command:
If all goes well, you should see this:
By default, this service will be running in your localhost on port 9004. You can also verify it by open it in your web browser.
Now, let’s go to our Tableau and set up the service. I am using Tableau Desktop version 2020.3.0. However, there will be no difference in the previous version as well.
First, go to Help, then choose Settings and Performance and select Manage Analytics Extension Connection.
Then, you can set up the Server and Port. You can leave Sign in with a username and password blank, as we don’t set up credentials in our TabPy service.
Once done, click the Test Connection. If successful, you will see this message:
Congratulations!! Now, our Tableau is already connected with TabPy and ready to use.
There are two ways that we can use to do Python calculation:
- Write code directly as Tableau calculated fields. The code then will be immediately executed on the fly in the TabPy server.
- Deploy a function into the TabPy server that can be reachable as a REST API endpoint.
In this article, I will only show how to do the first method, which we will write code directly as Tableau calculated fields.
As an example, we will perform clustering to the Airbnb dataset that publicly available through the Tableau site, and you can download it using this link. We will cluster each zipcode based on their housing characteristics using several popular clustering algorithms.
Step 1 Importing Data
In the first step, let’s import our data set to Tableau. This dataset has 13 columns.
As our primary goal is to see how we use TabPy, We will not focus on making the best possible model. Thus, we will only use the following variables in this dataset to perform clustering:
- The median number of beds in each zip code
- The average price in each zip code
- The median number of ratings in each zip code
Step 2 Create Control Parameters
We need to create two parameters that will be used to select our clustering method and number of clusters, which are:
- Cluster Numbers
- Clustering Algorithm
Step 3 Create a Script
We will create a python script as a calculated field in Tableau.
You can then insert the following script in a calculated field.
This code is wrapped in SCRIPT_REAL() function from Tableau and will do the following:
- Import required Python libraries.
- Scaling features with Standard Scaler
- Combine Scaled Features and handling null values
- Conditional to check which algorithm to use and do the following
- Return clustering results as a list.
Then we will convert the results into String data type to make it as categorical data.
One more thing to notice is we need to do the Table Calculation in Zipcode. So we need to change the Default Table Calculation to Zipcode to make this code works.
Step 4 Visualize Results
Now, it’s time to visualize the results. I use a Zipcode to create a Map to visualize the clustering results. We can use the parameter to change the number of clusters.
Let’s celebrate coming up to this point! If you follow the step, you have been successfully integrating Python and Tableau. This integration is a beginning step for a more advanced use case using Tableau and Python.
I’m looking forward to seeing what you build with this integration!
About the Author
Bima Putra Pratama is a Data Scientist with Tableau Desktop Specialist Certification, who always eager to expand his knowledge and skills. He was graduated as a Mining Engineer and began his Data Science journey through various online programs from HardvardX, IBM, Udacity, etc. Currently, he is making impacts together with DANA Indonesia in building a cashless society in Indonesia.
If you have any feedback or any topics to be discussed, please reach out to Bima via LinkedIn. I’m happy to connect with you!
Original. Reposted with permission.
- Creating Powerful Animated Visualizations in Tableau
- Comparing the Top Business Intelligence Tools: Power BI vs Tableau vs Qlik vs Domo
- Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau