Interview: Marc Smith, Chief Social Scientist, Connected Action, on Why We Need Open Tools for Social Networks
We discuss NodeXL impact stories, upcoming NodeXL features, importance of an open environment, future of social media analytics, advice for novice researchers and more.

Smith's research focuses on computer-mediated collective action: the ways group dynamics change when they take place in and through social cyberspaces. While at Microsoft Research, he founded the Community Technologies Group and led the development of the "Netscan" web application and data mining engine.
Smith received a B.S. in International Area Studies from Drexel University in Philadelphia in 1988, an M.Phil. in social theory from Cambridge University in 1990, and a Ph.D. in Sociology from UCLA in 2001. He is an adjunct lecturer at the College of Information Studies at the University of Maryland. Smith is also a Distinguished Visiting Scholar at the Media-X Program at Stanford University.
Here is my interview with him:
Anmol Rajpurohit: Q1. Who are the primary users of NodeXL? What are some of the most memorable success stories you have heard so far from NodeXL users?

Some notable users and success stories come from a variety of scholars and disciplines. Professor Diane Cline at George Washington University is a scholar of antiquity with a specialty in the life of Alexander the Great. She has been able to quickly master NodeXL and use it to create some of the first maps of Alexander's social network. He may not have had a Facebook page, but Alexander did have a web of connections and relationships that are now easier for scholars to understand. Link
Business professor Scott Dempwolf at the University of Maryland uses NodeXL to map the connections formed when people author patents together. These networks reveal clusters of innovations that define a regional economic specialty. Link
Lee Rainie, director of the Pew Research Internet Project, used NodeXL to map a range of Twitter social media networks. Along with researchers from the Social Media Research Foundation, including myself, the research team documented the existence of six distinct patterns of network structures that regularly occur in Twitter topic streams. These six network types can help people understand the conversations in which they participate and recognize the patterns of conversations that they want to emulate. Link
AR: Q2. What is the next set of NodeXL features that you are working on?
MS: We are focusing on improved data importers from services like Twitter, Facebook, YouTube, Sina Weibo, and beyond. There is a lot of social media network data on the Internet and we'd like to make it easy for end-users to extract those networks without requiring any programming skills.
- We are working on simplifying use of NodeXL. It is still a challenge for new users to get their first network created and we want to reduce and remove those obstacles. Look for simpler interfaces that lead users step by step through the collection, analysis, visualization and publication of network analysis.
- We will update the web interface for the NodeXL Graph Gallery to enable more interaction with the data without needing to download and run NodeXL. This will allow much of the data consumption task platform independent.
AR: Q3. Why do you believe it is important to promote "Open Tools, Open Data, Open Scholarship" for Social Networks? How does it impact innovation? How is the Social Media Research foundation working towards this goal?

An open approach is important to facilitate collaboration with academic and commercial contributors. Open data and scholarship are important to ensure that many scholars can access the data needed to do research and that research is widely available to those who can benefit from it. With a quarter of humanity living in cyberspace, it is time to properly document, map and study this new terrain. Open tools help make that possible.
AR: Q4. Of the many commercial social network analysis tools available, are there any that you particularly admire?

AR: Q5. What do you personally think about the future of Social Media Analytics? Your predictions?
MS: Social media analytics is not just for analysts. Most of us spend a lot of our time in social media; it's where our people are. But there is far more social media data than any human can consume so we need to prioritize and filter our feeds. Analytic tools will become mainstream as people reach for tools that resolve the torrent of posts and messages into a focused image that reveals the key people, groups, topics, and bridges. Social media analytics might "disappear" at that point, becoming a normal part of our interfaces to social data. Visualization will be a big part of that, a necessary method of bringing analytic insights to people with limited quantitative training or skills.
So "Data Science" may soon follow the path traveled by "Desktop Publishing" - software will simplify complex processes so that casual end users can do 80% of what they need for themselves.
AR: Q6. Based on your consulting experience, what advice would you give to students and researchers aspiring for a successful career in Social Media Analytics?
MS: Software development and database skills are very useful, but I expect that tools like NodeXL point the way to a time when 80% of what we now call "data science" is done by casual end-users in the same way that "desktop publishing" enabled anyone with some text to create professional looking results. So technical skills alone will not be enough, insight into social processes and structure will be a big way to differentiate.

AR: Q7. On a personal note, if you ran out of your to-do list on a weekday, what will you do? What book (or article) did you read recently and would strongly recommend?
MS: If I have free time I like to hang out with my family, we walk our dogs around the hills overlooking San Francisco bay. My favorite recent read was James Gleick's "The Information: A History, a Theory, a Flood" which provides a grand overview of the rise of information technology and the way it shapes our worldview.
Related: