Data Scientist looks at the 6 Star Wars movies to extract the social networks, within each film and across the whole Star Wars universe. Network structure reveals some surprising differences between the movies, and finds who is actually the central character.
The individual networks again show that the prequel trilogy has more characters and more interactions overall. The original episodes have less characters, but they interact more with each other.
George Lucas said:
It really is the story of the tragedy of Darth Vader, and it starts when he’s nine, and it ends when he’s dead. (source)
But is Darth Vader/Anakin really the central character? Let’s use some methods from network analysis to see who is really important in the stories and their social structures.
I computed two measures of importance in the networks for each of the films:
Degree centrality – this is simply the number of connections the node has in the network. In the Star Wars movies, this corresponds to the total number of scenes where each character speaks.
Betweenness – this measure looks at how many shortest paths in the network lead through the node. For example, imagine you are Leia and you want to send a message to Greedo – the shortest path how to send it is via Han Solo, because he interacted both with Leia and with Greedo. On the other hand if you want to send a message to Luke, you don’t have to go through Han because Leia knows Luke directly. The betweenness centrality for Han is computed using the number of shortest paths between all other characters that pass through him.
The two measures both show how important is a character in the network. The degree centrality shows how many people does each character interact with directly. The betweenness relates more to how integral each of the characters is to the story. Characters with high betweenness connect different areas of the social network.
For both measures, higher values mean more importance. Here are the top 5 characters for each movie:
Episode I
Name
Degree
1.
QUI-GON
26
2.
ANAKIN
23
3.
JAR JAR
19
4.
R2-D2
19
5.
PADME
18
Name
Betweenness
1.
QUI-GON
91.7
2.
JAR JAR
46.6
3.
EMPEROR
41.8
4.
R2-D2
30.9
5.
NUTE GUNRAY
27.2
Episode II
Name
Degree
1.
ANAKIN
21
2.
OBI-WAN
19
3.
PADME
17
4.
YODA
10
5.
MACE WINDU
10
Name
Betweenness
1.
OBI-WAN
64.7
2.
PADME
56.5
3.
MACE WINDU
12.7
4.
JAR JAR
8.3
5.
EMPEROR
6.8
Episode III
Name
Degree
1.
ANAKIN
14
2.
OBI-WAN
13
3.
BAIL ORGANA
12
4.
EMPEROR
11
5.
PADME
10
Name
Betweenness
1.
OBI-WAN
22.7
2.
EMPEROR
19.0
3.
PADME
8.0
4.
R2-D2
6.7
5.
BAIL ORGANA
4.5
It seems that Anakin is overall the most connected character in the first three films, based on his degree. He is however not very integral to the relations in the films! His betweenness score is so small he never makes it to the top-5 characters. This means that all the other characters interact directly between themselves rather than through Anakin. How do the same measures look for the original trilogy?
Episode IV
Name
Degree
1.
LUKE
15
2.
LEIA
12
3.
C-3PO
10
4.
CHEWBACCA
9
5.
HAN
8
Name
Betweenness
1.
LUKE
32.7
2.
LEIA
19.7
3.
HAN
15.0
4.
C-3PO
13.2
5.
CHEWBACCA
8.0
Episode V
Name
Degree
1.
LUKE
12
2.
DARTH VADER
12
3.
HAN
11
4.
R2-D2
11
5.
C-3PO
10
Name
Betweenness
1.
LUKE
25.2
2.
DARTH VADER
11.3
3.
LEIA
9.7
4.
HAN
6.7
5.
R2-D2
4.5
Episode VI
Name
Degree
1.
LUKE
15
2.
R2-D2
12
3.
C-3PO
11
4.
LEIA
9
5.
HAN
9
Name
Betweenness
1.
LUKE
24.3
2.
C-3PO
23.0
3.
DARTH VADER
18.5
4.
CHEWBACCA
16.0
5.
LANDO
5.5
Here both the centrality measures show very similar results – Luke is the most central character across all the films, and using both measures. The order of characters based on the two measures is almost the same.
The centrality analysis quantifies some of the things we could see from the social networks. The prequel trilogy has more complex social structures, with more interconnected characters. This also leads to the fact that Anakin is not that central to the story – some of the storylines happen alongside Anakin’s story, or involve Anakin only on the side. On the other hand, the original trilogy has a more tight-knit structure. There is a smaller number of central characters and they bind the story together – this results into the agreement between the degree and betweenness centrality measures.
Perhaps this is part of the reason why the original trilogy is more popular – the plots are more consistent and driven by the main characters. The prequels have a more decentralized structure and no clear hero. Although the stories are linked by Anakin, he is not binding the other characters together.
How do the measures look when we look at the full social network from all the episodes together? I looked at two variants of the network. In the first one Anakin and Darth Vader appear as two separate individuals, in the second I merged them together into a single person.
Joint network 1
Anakin and Darth Vader separated
Name
Degree
1.
ANAKIN
42
2.
R2-D2
41
3.
OBI-WAN
37
4.
PADME
34
5.
C-3PO
31
Name
Betweenness
1.
OBI-WAN
370.4
2.
PADME
237.3
3.
R2-D2
236.7
4.
C-3PO
222.9
5.
LUKE
194.4
Joint network 2
Anakin and Darth Vader fused
Name
Degree
1.
DARTH VADER
59
2.
R2-D2
41
3.
OBI-WAN
37
4.
PADME
34
5.
C-3PO
31
Name
Betweenness
1.
OBI-WAN
348.7
2.
C-3PO
303.1
3.
DARTH VADER
241.5
4.
R2-D2
227.6
5.
PADME
226.2
If we look at Anakin and Darth Vader separately, Anakin is still the most connected character but he’s not central to the network. If we merge them together, things improve a bit. Now Darth Vader/Anakin is the third most important character in terms of betweenness. Overall, the social networks seem to show that the Star Wars movies are actually linked together by Obi-Wan Kenobi rather than Darth Vader.