Interview: Marc Smith, Chief Social Scientist, Connected Action, on Why We Need Open Tools for Social Networks

We discuss NodeXL impact stories, upcoming NodeXL features, importance of an open environment, future of social media analytics, advice for novice researchers and more.

Marc SmithMarc A. Smith is a sociologist specializing in the social organization of online communities and computer-mediated interaction. Smith leads the Connected Action consulting group and lives and works in Silicon Valley, California. Smith co-founded and directs the Social Media Research Foundation, a non-profit devoted to open tools, data, and scholarship related to social media research.

Smith's research focuses on computer-mediated collective action: the ways group dynamics change when they take place in and through social cyberspaces. While at Microsoft Research, he founded the Community Technologies Group and led the development of the "Netscan" web application and data mining engine.

Smith received a B.S. in International Area Studies from Drexel University in Philadelphia in 1988, an M.Phil. in social theory from Cambridge University in 1990, and a Ph.D. in Sociology from UCLA in 2001. He is an adjunct lecturer at the College of Information Studies at the University of Maryland.  Smith is also a Distinguished Visiting Scholar at the Media-X Program at Stanford University.

Here is my interview with him:

Anmol Rajpurohit: Q1. Who are the primary users of NodeXL? What are some of the most memorable success stories you have heard so far from NodeXL users?

nodexl logoMarc Smith: NodeXL is for anyone who is interested in networks, social networks and particularly social media networks.  Our users are often scholars, researchers, students, managers, and analysts who are interested in understanding the shape, structure, and key positions within a connected structure.  Since societies are connected structures, people interested in understanding organizations, groups, enterprises, and markets are often interested in networks.  The challenge has been that network analysis tools have been difficult to use; most network analysis tools are programming libraries or are too complex for casual use.  This has meant that network analysis has been out of reach for many.  We have focused on creating a network analysis tool built for ease of use, automation, and reporting that highlights the key features of interest in a connected structure.  By integrating with Excel, we reach people where they often are already working.  With NodeXL, if you can make a pie chart you can now make a network chart.

Some notable users and success stories come from a variety of scholars and disciplines.  Professor Diane Cline at George Washington University is a scholar of antiquity with a specialty in the life of Alexander the Great.  She has been able to quickly master NodeXL and use it to create some of the first maps of Alexander's social network.  He may not have had a Facebook page, but Alexander did have a web of connections and relationships that are now easier for scholars to understand. Link

Business professor Scott Dempwolf at the University of Maryland uses NodeXL to map the connections formed when people author patents together.  These networks reveal clusters of innovations that define a regional economic specialty. Link

Lee Rainie, director of the Pew Research Internet Project, used NodeXL to map a range of Twitter social media networks.  Along with researchers from the Social Media Research Foundation, including myself, the research team documented the existence of six distinct patterns of network structures that regularly occur in Twitter topic streams. These six network types can help people understand the conversations in which they participate and recognize the patterns of conversations that they want to emulate. Link

AR: Q2. What is the next set of NodeXL features that you are working on?

MS: We are focusing on improved data importers from services like Twitter, Facebook, YouTube, Sina Weibo, and beyond.  There is a lot of social media network data on the Internet and we'd like to make it easy for end-users to extract those networks without requiring any programming skills.
  • We are working on simplifying use of NodeXL.  It is still a challenge for new users to get their first network created and we want to reduce and remove those obstacles.  Look for simpler interfaces that lead users step by step through the collection, analysis, visualization and publication of network analysis.
  • We will update the web interface for the NodeXL Graph Gallery to enable more interaction with the data without needing to download and run NodeXL.  This will allow much of the data consumption task platform independent.

AR: Q3. Why do you believe it is important to promote "Open Tools, Open Data, Open Scholarship" for Social Networks? How does it impact innovation? How is the Social Media Research foundation working towards this goal?

SMR FoundationMS: Like Mozilla's commitment to the Firefox web browser, we share the sense that some tools are better if they are open and free.  NodeXL enables a larger group of people to better understand the structures and dynamics of social media - a place where hundreds of millions of people now spend a great deal of time.  We sometimes refer to NodeXL as a "point-and-shoot digital camera for crowds in cyberspace". Since we want to create as many pictures of social media networks as possible, to better document the range of variation in these populations, we want as many people as possible to get and use the tool.

An open approach is important to facilitate collaboration with academic and commercial contributors.  Open data and scholarship are important to ensure that many scholars can access the data needed to do research and that research is widely available to those who can benefit from it.  With a quarter of humanity living in cyberspace, it is time to properly document, map and study this new terrain.  Open tools help make that possible.

AR: Q4. Of the many commercial social network analysis tools available, are there any that you particularly admire? 

GephiMS: There are many very powerful and sophisticated network tools available.  Of them I am always very impressed by Gephi, which is a free and open Java application that can generate very beautiful network visualizations.

AR: Q5. What do you personally think about the future of Social Media Analytics? Your predictions?

MS: Social media analytics is not just for analysts.  Most of us spend a lot of our time in social media; it's where our people are.  But there is far more social media data than any human can consume so we need to prioritize and filter our feeds. Analytic tools will become mainstream as people reach for tools that resolve the torrent of posts and messages into a focused image that reveals the key people, groups, topics, and bridges.  Social media analytics might "disappear" at that point, becoming a normal part of our interfaces to social data.  Visualization will be a big part of that, a necessary method of bringing analytic insights to people with limited quantitative training or skills.

So "Data Science" may soon follow the path traveled by "Desktop Publishing" - software will simplify complex processes so that casual end users can do 80% of what they need for themselves.

AR: Q6. Based on your consulting experience, what advice would you give to students and researchers aspiring for a successful career in Social Media Analytics?

MS: Software development and database skills are very useful, but I expect that tools like NodeXL point the way to a time when 80% of what we now call "data science" is done by casual end-users in the same way that "desktop publishing" enabled anyone with some text to create professional looking results.  So technical skills alone will not be enough, insight into social processes and structure will be a big way to differentiate. The Information book

AR: Q7. On a personal note, if you ran out of your to-do list on a weekday, what will you do? What book (or article) did you read recently and would strongly recommend? 

MS: If I have free time I like to hang out with my family, we walk our dogs around the hills overlooking San Francisco bay.  My favorite recent read was James Gleick's "The Information: A History, a Theory, a Flood" which provides a grand overview of the rise of information technology and the way it shapes our worldview.