Choropleth Maps in R
Choropleth maps provides a very simple and easy way to understand visualizations of a measurement across different geographical areas, be it states or countries.
Improving visualization
We will use some of the functions of packages ‘ggplot2’ and ‘ggmap’ to improve the visual appeal of maps that we have created.
Let’s begin by creating our first plot and then subsequently improve in the next plots by adding more features.
> #Plot 1 > ggplot() + + geom_polygon(data = Map_plot, + aes(x = long, y = lat, group = group, fill = score), + color = "black", size = 0.5) + + coord_map()+ + scale_fill_distiller(name="Score")+ + theme_nothing(legend = TRUE)+ + labs(title="Score in India - Distribution by State")
Let’s make our map a little more colorful so that it shows the distribution clearly. Using the function display.brewer.all() from the package ‘RColorBrewer’, gives us all the color palettes available in R. We can choose the one we like.
# Check Color palettes display.brewer.all()
Now, change the color palette and change the legend by adding more breaks.
> #Plot 2 > ggplot() + + geom_polygon(data = Map_plot, + aes(x = long, y = lat, group = group, fill = score), + color = " Dark Blue", size = 1) + + coord_map()+ + scale_fill_distiller(name="Score", palette = "Set3" , breaks = pretty_breaks(n = 7))+ + theme_nothing(legend = TRUE)+ + labs(title="Score in India - Distribution by State")
Pretty_breaks() is a function in ‘scales’ package which can help us in defining the number of breaks we want to see in the legend. In the above map, we have 7 breaks from 400 to 1600 at an interval of 200; while in the preceding graph there were only 3 breaks.
Now, add the state names to the graph to make it more appealing and illustrative.
> #Plot3 > ggplot() + + geom_polygon(data = Map_plot, + aes(x = long, y = lat, group = group, fill = score), + color = " Dark Blue", size = 1) + + coord_map()+ + scale_fill_distiller(name="Score", palette = "Set3" , breaks = pretty_breaks(n = 7))+ + theme_nothing(legend = TRUE)+ + labs(title="Score in India - Distribution by State")+ + geom_text(data=name_lat_lon, aes(long, lat, label = NAME_1), size=2)
Display external data on choropleth maps
We will now import external data and try to create choropleth maps for those data points. The dataset we are using provides following information for all the 36 states and union territories of India:
- ID
- State or union territory
- Population (2011 Census)
- Decadal growth (2001–2011)
- Area (km sq)
- Density (population per sq km)
- Sex ratio
ID of each state is same as the ID that has been assigned in the Merged_data created earlier.
> d1 = read.csv(file.choose(), header = T) > head(d1) ID State.or.union.territory Population..2011.Census. Decadal.growth..2001.2011. Area..km.sq. Density..population.per.sq.km. 1 1 Andaman and Nicobar Islands 379944 0.067 8249 827.1412 2 2 Andhra Pradesh 49386799 0.111 162968 365.1876 3 3 Arunachal Pradesh 1382611 0.259 83743 1102.3931 4 4 Assam 31169272 0.169 78438 1029.2471 5 5 Bihar 103804637 0.251 94163 235.5190 6 6 Chandigarh 1055450 0.171 114 554.6676 Sex.ratio 1 908 2 946 3 916 4 947 5 931 6 995
> #Merging with external source > state_data2<-data.frame(id=d1$ID, NAME_1=d1$State.or.union.territory, pop = d1$Population..2011.Census., growth=d1$Decadal.growth..2001.2011., area = d1$Area..km.sq., pop_density = d1$Density..population.per.sq.km., sex_ratio = d1$Sex.ratio) > head(state_data2) id NAME_1 pop growth area pop_density sex_ratio 1 1 Andaman and Nicobar Islands 379944 0.067 8249 827.1412 908 2 2 Andhra Pradesh 49386799 0.111 162968 365.1876 946 3 3 Arunachal Pradesh 1382611 0.259 83743 1102.3931 916 4 4 Assam 31169272 0.169 78438 1029.2471 947 5 5 Bihar 103804637 0.251 94163 235.5190 931 6 6 Chandigarh 1055450 0.171 114 554.6676 995
#Fortify file merged_data2<-merge(fortify_shape, state_data2, by="id", all.x=TRUE) map_plot2<-merged_data2[order(merged_data$order), ]
> ggplot() + + geom_polygon(data = map_plot2, + aes(x = long, y = lat, group = group, fill = pop/1000), + color = " Dark Blue", size = 0.5) + + coord_map()+ + scale_fill_distiller(name="Population", palette = "Set3")+ + theme_nothing(legend = TRUE)+ + labs(title="Population in India")+ + geom_text(data=name_lat_lon, aes(long, lat, label = NAME_1), size=2)
If we were to represent all the 5 measures in the map and see all the maps at once in a single chart, we will use function grid.arrange() of the package ‘gridExtra’. This will help us in presenting multiple maps at once. First, we will create all the five maps that we want to show and then use the function.
#Plotting multiple maps at once plot1 = ggplot() + geom_polygon(data = map_plot2, aes(x = long, y = lat, group = group, fill = pop/1000), color = " Dark Blue", size = 0.5) + coord_map()+ scale_fill_distiller(name="Population (in '000)", palette = "Set3")+ theme_nothing(legend = TRUE)+ labs(title="Population in India") plot2 = ggplot() + geom_polygon(data = map_plot2, aes(x = long, y = lat, group = group, fill = growth*100), color = " Dark Blue", size = 0.5) + coord_map()+ scale_fill_distiller(name="Decadal Growth (in %)", palette = "Set3")+ theme_nothing(legend = TRUE)+ labs(title="Decadal growth (in %) in India") plot3 = ggplot() + geom_polygon(data = map_plot2, aes(x = long, y = lat, group = group, fill = area/1000), color = " Dark Blue", size = 0.25) + coord_map()+ scale_fill_distiller(name="Area (in '000 Sq Km)", palette = "Set3")+ theme_nothing(legend = TRUE)+ labs(title="Area (in '000 sq km) in India") plot4= ggplot() + geom_polygon(data = map_plot2, aes(x = long, y = lat, group = group, fill = pop_density), color = " Dark Blue", size = 0.25) + coord_map()+ scale_fill_distiller(name="Population Density", palette = "Set3")+ theme_nothing(legend = TRUE)+ labs(title="Population Density in India") plot5 = ggplot() + geom_polygon(data = map_plot2, aes(x = long, y = lat, group = group, fill = sex_ratio), color = " Dark Blue", size = 0.25) + coord_map()+ scale_fill_distiller(name="Sex Ratio", palette = "Set3")+ theme_nothing(legend = TRUE)+ labs(title="Sex Ratio (per '000 males) in India")
Calling the library ‘gridExtra’ and using the function grid.arrange() to present all the 5 graphs at once.
library(gridExtra) grid.arrange(plot1, plot2, plot3, plot4, plot5)
The above examples show the flexibility and the convenience that choropleth maps provide us in presenting a measurement on geographical base. I have used the map of India as the base geographical region; the same process can be applied to any geographical base and data.
After going to the article, I am sure you will agree to my point with which I started the article – choropleth maps are the best bets when we want to leave a strong impression on the audience in 15 seconds. Don’t you?
Bio: This article was contributed by Perceptive Analytics. Chaitanya Sagar, Vishnu Reddy and Saneesh Veetil contributed to this article. Perceptive Analytics provides data analytics, data visualization, business intelligence and reporting services to e-commerce, retail, healthcare and pharmaceutical industries. Our client roster includes Fortune 500 and NYSE listed companies in the USA and India.
Related:
- Learn Generalized Linear Models (GLM) using R
- A Solution to Missing Data: Imputation Using R
- Building Regression Models in R using Support Vector Regression