Exclusive Interview: David Stringfellow, Chief Economist, State Utah Auditor
Tags: Crowdsourcing, David Stringfellow, Economist, Government, Privacy, Public Policy, Skills, Utah
We discuss Analytics for Public Policy decisions, responsibilities of Utah Chief Data Officer, crowdsourcing analytics for resolving Government problems and most important skills for data science practitioners.
David recently delivered a talk at Big Data Innovation Summit 2014 held in Santa Clara on “Public Policy and the Complexity of Analytics ”. He mentioned that data science paired with the science of complexity provide an opportunity for economists and all public sector managers to better understand how world works. He also stated that large scale data analytics will have a prominent role in creating more useful tools to better guide decision making in government.
Here is my interview with him:
Anmol Rajpurohit: Q1. What are your thoughts on the current usage of Analytics for Public Policy decisions at Federal and State level? What would you consider as good benchmarks to assess the preparedness of governments to use Analytics effectively?
David Stringfellow: There are pockets of excellent analytics within both the federal and state governments. But I don’t think a critical mass has been reached within governments to broadly and systematically adopt enterprise wide policy that will lead to more data driven public policy decisions. I believe these decisions will be made over the next 5 years as the value of the extensive collections of government data becomes glaringly obvious.
Two good benchmarks to separate the leaders from the laggards in the government sector are 1) the extent to which data is made available to both the public and data community in useful ways and 2) whether a government employs the right human capital in executive positions with both subject matter expertise and fluency in data use. Examples, include websites that open up the government checkbook to inspection (USA spending, Transparent Utah, and USPIRG). A good GUI is useful to a curious citizen, but of at least equal import is the ability of power data users to download the same data in bulk for true analytics. Governments will need Chief Data Scientists (in function if not title) to marshal the resources to truly leverage the value of the data governments maintain to drive more efficiency.
AR: Q2. How does a typical day of the Chief Data Officer for State of Utah looks like? What are the kind of problems that you work on?
DS: My typical day, in roughly equal measure, involves absorbing subject matter expertise, wrangling data, and communicating methods and results. Government is in many lines of business. Some of the things I have studied in the last few months: how to better optimize our supply chain management systems for our liquor stores, how to systematically detect fraud in administering hundreds of millions of dollars in benefit payments, how to redesign our human resource policies around the use and management of sick leave for a workforce of 20,000 employees.
Over the last year, I have gained access to a dozen or more information systems and pulled billions of records out of widely divergent systems siloed across many departments and agencies. After transforming, exploring, and wrestling the data into a useful form, I write code to build models to detect problems and find efficiencies. I validate the models with field research or through discussion with subject matter experts. I also speak with Utah’s elected policy makers and agency management and write reports to communicate the research findings and recommendations for improving a wide swath of public policy.
AR: Q3. What are the most memorable "aha" experiences you have had when Analytics significantly impacted Public Policy?
DS: My most memorable “aha” experience with analytics changing public policy was when the Utah Legislature passed a bill in 2007, without a dissenting vote, to reform the state’s individual income tax to a single tax rate system. Changing the amount of tax people pay is extremely contentious, especially when raising taxes on 10% of taxpayers. I spent several years building the micro simulation model of the tax system to predict the effects of thousands of versions of potential reform. Policy makers used the results of the model to narrow reform based on the effects they desired. It was the most intense data driven policy making I have witnessed. People had more trust in the process when they could see their suggestions working through the model.
AR: Q4. Do you think some of the Government problems can be resolved through making the data publicly available and leveraging crowd-sourcing?
DS: Yes, I think making privacy screened data publicly available can help resolve contentious policy problems. Through crowd sourcing, everyone would have access to the foundational data, we would be better able to coalesce around facts that are universally accepted. It is not a panacea, and will not eliminate all disagreements or varying interpretation of the facts, but it will form a shared foundation which should increase the probability of finding common ground and agreement from which to build compromise.
AR: Q5. What is your opinion on the privacy concerns regarding Big Data?
DS: There are privacy concerns. They will only grow with time. Government is behind the curve in making rules for Big Data. In coming years there will almost surely be more regulation regarding how private data can be collected, aggregated, and used.
AR: Q6. What skills do you think are the most important for practioners in the field of Data Science?
DS: I am not sure it is a skill, but I think the most important disposition to have as an agent in the Data Science field is curiosity about the world. My interactions with some government employees compared with many of my talented research analysts over the last decade leads me to believe it is not an easily learned or exercised attribute. But curiosity only takes a person so far. It is wasted if it does not lead to subject matter expertise and a deep grasp of domain knowledge.
As far as skills, I think it is necessary for people to grasp a range of data architectures, be familiar with a few different programming languages while really mastering at least one. Always be aware of the context of an analysis to better communicate the methods and results to people who are not data scientists. Decision makers want a relevant story and plain language recommendations, not the really great math, impressive programming acrobatics, or even the beautifully artistic visualizations of data worthy of a nice frame.
AR: Q7. On a personal note, are there any good books that you’re reading lately, and would like to recommend?
DS: I’ve been reading Why Does the World Exist by Jim Holt, an exploration of the philosophical underpinnings of reality and the outlines of how people have approached the question through time.