Human Dynamics – Data Mining Mobile Phone Usage

Mobile phone usage contains a gold mine of insights. We examine what was learned about human social connections from the first-ever extensive study of social interactions in Mexico.

Guest blog by Carlos Sarraute (Grandata), May 6, 2014.

Mobile phone usage provides a wealth of information, which can be used to better understand the demographic structure of a population, and to fill gaps respect to basic questions: e.g. what are the differences in mobile phone usage between genders, or different age groups? At Grandata, we have a research team specialized in studying Human Dynamics. In this case, we focused on the population of Mexican mobile phone users.

1. Observational study

Our first approach was to explore the data in order to gain insights. We performed (to our knowledge) the first extensive study of social interactions in the country of Mexico focusing on gender and age, based on mobile phone usage. The ability to analyze the communications between tens of millions of people allowed us to make strong inferences and detect subtle properties of the social network.

Regarding gender, we made some interesting observations: (i) a gender homophily in the communication network (i.e. men tend to talk more with men, and women with women); (ii) an asymmetry between genders (men talk more when they make outgoing calls, and women talk more when receiving incoming calls), possibly reflecting a difference of roles in Mexican society.

It would be interesting to see how these differences change in other regions of the world like Europe or the United States.

We also compared communication habits for different age groups, and found statistically significant differences. We observed a strong age homophily in the social network (see below the communications matrix according to the users' age). Communication Matrix The clearly marked diagonal shows that users have a strong tendency to communicate with interlocutors of their same age. This preference can also be seen in the next figure, which shows the number of links according to the age difference between users. The number of links decreases with the age difference, except around the value d = 21, where an interesting inflection point can be observed (possibly relating to different generations, i.e. parents and children).
Number of links vs age difference
2. A new predictive algorithm

Based on these results, we set to work on developing a novel methodology to predict demographic features (namely age and gender) by leveraging individual calling patterns, as well as the structure of the communication graph.

As a first approach, we used a set of standard Machine Learning tools based on node features. However, these techniques cannot harness the topological information of the network, and exploit the correlations between the users' communications.

To leverage this information, we developed a purely graph based algorithm inspired in a reaction-diffusion process, and showed that with this methodology we could predict the age category for a significant set of nodes in the network. Finally, we combined both Machine Learning techniques and the reaction-diffusion algorithm. Our experiments showed that the combined method increases our predictive power on a real-world dataset with millions of users.

Finally, our new method allows us to predict demographic features such as age and gender with high precision. This has in turn numerous applications, from market research and segmentation to the possibility of targeted campaigns (such as health campaigns for women).

(For more details, see Human Mobility and Predictability enriched by Social Phenomena Information (extended abstract), by Nicolas Ponieman, Alejo Salles, Carlos Sarraute, 2013.

Carlos Sarraute

Carlos Sarraute
, PhD is Director of Research at Grandata, a company that integrates first-party and telco partner data to understand key market trends, predict customer behavior, and deliver impressive business results.