Topics: Coronavirus | AI | Data Science | Deep Learning | Machine Learning | Python | R | Statistics

KDnuggets Home » News » 2018 » Mar » Tutorials, Overviews » Choropleth Maps in R ( 18:n11 )

# Choropleth Maps in R

Choropleth maps provides a very simple and easy way to understand visualizations of a measurement across different geographical areas, be it states or countries. comments

Improving visualization

We will use some of the functions of packages ‘ggplot2’ and ‘ggmap’ to improve the visual appeal of maps that we have created.

Let’s begin by creating our first plot and then subsequently improve in the next plots by adding more features.

```> #Plot 1
> ggplot() +
+   geom_polygon(data = Map_plot,
+                aes(x = long, y = lat, group = group, fill = score),
+                color = "black", size = 0.5) +
+   coord_map()+
+   scale_fill_distiller(name="Score")+
+   theme_nothing(legend = TRUE)+
+   labs(title="Score in India - Distribution by State")
``` Let’s make our map a little more colorful so that it shows the distribution clearly. Using the function display.brewer.all() from the package ‘RColorBrewer’, gives us all the color palettes available in R. We can choose the one we like.

```# Check Color palettes
display.brewer.all()
``` Now, change the color palette and change the legend by adding more breaks.

```> #Plot 2
> ggplot() +
+   geom_polygon(data = Map_plot,
+                aes(x = long, y = lat, group = group, fill = score),
+                color = " Dark Blue", size = 1) +
+   coord_map()+
+   scale_fill_distiller(name="Score", palette = "Set3" , breaks = pretty_breaks(n = 7))+
+   theme_nothing(legend = TRUE)+
+   labs(title="Score in India - Distribution by State")
``` Pretty_breaks() is a function in ‘scales’ package which can help us in defining the number of breaks we want to see in the legend. In the above map, we have 7 breaks from 400 to 1600 at an interval of 200; while in the preceding graph there were only 3 breaks.

Now, add the state names to the graph to make it more appealing and illustrative.

```> #Plot3
> ggplot() +
+   geom_polygon(data = Map_plot,
+                aes(x = long, y = lat, group = group, fill = score),
+                color = " Dark Blue", size = 1) +
+   coord_map()+
+   scale_fill_distiller(name="Score", palette = "Set3" , breaks = pretty_breaks(n = 7))+
+   theme_nothing(legend = TRUE)+
+   labs(title="Score in India - Distribution by State")+
+   geom_text(data=name_lat_lon, aes(long, lat, label = NAME_1), size=2)
``` Display external data on choropleth maps

We will now import external data and try to create choropleth maps for those data points. The dataset we are using provides following information for all the 36 states and union territories of India:

• ID
• State or union territory
• Population (2011 Census)
• Area (km sq)
• Density (population per sq km)
• Sex ratio

ID of each state is same as the ID that has been assigned in the Merged_data created earlier.

```> d1 = read.csv(file.choose(), header = T)
ID	State.or.union.territory Population..2011.Census. Decadal.growth..2001.2011. Area..km.sq. Density..population.per.sq.km.
1  1 Andaman and Nicobar Islands                  379944                        0.067     	  8249                	827.1412
2  2 Andhra Pradesh                             49386799                  	0.111   	162968                  365.1876
3  3 Arunachal Pradesh 	                         1382611                  	0.259    	 83743                 1102.3931
4  4 Assam             	                        31169272                  	0.169    	 78438                 1029.2471
5  5 Bihar 	                               103804637                  	0.251    	 94163                  235.5190
6  6 Chandigarh              	                 1055450                  	0.171      	   114                	554.6676
Sex.ratio
1   	908
2   	946
3   	916
4   	947
5   	931
6   	995
```

```> #Merging with external source
> state_data2<-data.frame(id=d1\$ID, NAME_1=d1\$State.or.union.territory, pop = d1\$Population..2011.Census., growth=d1\$Decadal.growth..2001.2011., area = d1\$Area..km.sq., pop_density = d1\$Density..population.per.sq.km., sex_ratio = d1\$Sex.ratio)
id                  	NAME_1   	            pop growth   area     pop_density sex_ratio
1  1 Andaman and Nicobar Islands	379944         0.067   8249	      827.1412   908
2  2          	Andhra Pradesh         49386799     0.111   162968	      365.1876   946
3  3       	Arunachal Pradesh    1382611        0.259   83743          1102.3931 916
4  4                  Assam                       31169272      0.169   78438          1029.2471 947
5  5               	Bihar                          103804637   0.251    94163	      235.5190  931
6  6              	Chandigarh                1055450       0.171    114	      554.6676  995
```

```#Fortify file
merged_data2<-merge(fortify_shape, state_data2, by="id", all.x=TRUE)
map_plot2<-merged_data2[order(merged_data\$order), ]
```

```> ggplot() +
+   geom_polygon(data = map_plot2,
+                aes(x = long, y = lat, group = group, fill = pop/1000),
+                color = " Dark Blue", size = 0.5) +
+   coord_map()+
+   scale_fill_distiller(name="Population", palette = "Set3")+
+   theme_nothing(legend = TRUE)+
+   labs(title="Population in India")+
+   geom_text(data=name_lat_lon, aes(long, lat, label = NAME_1), size=2)
``` If we were to represent all the 5 measures in the map and see all the maps at once in a single chart, we will use function grid.arrange() of the package ‘gridExtra’. This will help us in presenting multiple maps at once. First, we will create all the five maps that we want to show and then use the function.

```#Plotting multiple maps at once
plot1 = ggplot() +
geom_polygon(data = map_plot2,
aes(x = long, y = lat, group = group, fill = pop/1000),
color = " Dark Blue", size = 0.5) +
coord_map()+
scale_fill_distiller(name="Population (in '000)", palette = "Set3")+
theme_nothing(legend = TRUE)+
labs(title="Population in India")

plot2 = ggplot() +
geom_polygon(data = map_plot2,
aes(x = long, y = lat, group = group, fill = growth*100),
color = " Dark Blue", size = 0.5) +
coord_map()+
scale_fill_distiller(name="Decadal Growth (in %)", palette = "Set3")+
theme_nothing(legend = TRUE)+
labs(title="Decadal growth (in %) in India")

plot3 = ggplot() +
geom_polygon(data = map_plot2,
aes(x = long, y = lat, group = group, fill = area/1000),
color = " Dark Blue", size = 0.25) +
coord_map()+
scale_fill_distiller(name="Area (in '000 Sq Km)", palette = "Set3")+
theme_nothing(legend = TRUE)+
labs(title="Area (in '000 sq km) in India")

plot4= ggplot() +
geom_polygon(data = map_plot2,
aes(x = long, y = lat, group = group, fill = pop_density),
color = " Dark Blue", size = 0.25) +
coord_map()+
scale_fill_distiller(name="Population Density", palette = "Set3")+
theme_nothing(legend = TRUE)+
labs(title="Population Density in India")

plot5 = ggplot() +
geom_polygon(data = map_plot2,
aes(x = long, y = lat, group = group, fill = sex_ratio),
color = " Dark Blue", size = 0.25) +
coord_map()+
scale_fill_distiller(name="Sex Ratio", palette = "Set3")+
theme_nothing(legend = TRUE)+
labs(title="Sex Ratio (per '000 males) in India")
```

Calling the library ‘gridExtra’ and using the function grid.arrange() to present all the 5 graphs at once.

```library(gridExtra)
grid.arrange(plot1, plot2, plot3, plot4, plot5)
``` The above examples show the flexibility and the convenience that choropleth maps provide us in presenting a measurement on geographical base. I have used the map of India as the base geographical region; the same process can be applied to any geographical base and data.

After going to the article, I am sure you will agree to my point with which I started the article – choropleth maps are the best bets when we want to leave a strong impression on the audience in 15 seconds. Don’t you?

Bio: This article was contributed by Perceptive Analytics. Chaitanya Sagar, Vishnu Reddy and Saneesh Veetil contributed to this article. Perceptive Analytics provides data analytics, data visualization, business intelligence and reporting services to e-commerce, retail, healthcare and pharmaceutical industries. Our client roster includes Fortune 500 and NYSE listed companies in the USA and India.

Related: Get KDnuggets, a leading newsletter on AI, Data Science, and Machine Learning