KDnuggets Exclusive: Interview with Rayid Ghani, Chief Scientist Obama 2012 Campaign
Rayid Ghani was a leading analytics professional before he joined Obama reelection campaign as a Chief Scientist. KDnuggets asks him about analytics role in the campaign, key to success, Big Data bubble, and advice for aspiring data scientists.
I have known Rayid Ghani for a number of years as a leading analytics professional and an active participant in KDD conferences, and he will be chairing Industry/Government track at the forthcoming KDD-2013 conference in Chicago. Rayid left a senior position at Accenture in 2011 to join Obama reelection campaign. Now that he had time to decompress after the election, I was pleased that he found time to answer questions for KDnuggets.
Gregory PS: Congratulations on the success of Obama 2012 campaign. Many stories published since emphasized the well-run analytics operation. How important, in your opinion, were analytics to winning the election?
Rayid Ghani: Analytics were extremely important but by no means sufficient to win the elections.
1: We had an excellent candidate who gets all the credit for winning the election. Analytics certainly contributed but couldn't have helped much without Barack Obama's leadership.
2: We had an excellent team of staff and volunteers who did the hard work on the ground - organizing, making calls, knocking on doors, all of which were critical in making the analytics actionable and the campaign succeed.
3: We had leadership that made sure analytics was embedded in every part of the campaign and that the decisions that were made in the campaign were informed by analytics.
GPS: Where did analytics make the biggest contribution - in pre-election analysis, customizing messages, fund-raising, getting out the vote, elsewhere?
RG: All of them. We were part of every campaign function ranging from fund-raising, recruiting and mobilizing volunteers, messaging, polling, social media, tv ads, online ads, persuasion efforts, getting out the vote. We informed almost every department in the campaign and helped them be more efficient and effective.
GPS: Can you share some interesting examples of non-trivial analytics findings ? For example, it was reported that the most effective email subject line for Obama Campaign was "Hey".
RG: I think the key to the analytics operation was that we weren't really going after 'insights' and focusing on operationalizing our models and analysis. What made us successful was that we embedded analytics in processes across the campaign and in tools that people were using to get their jobs done. By completing the 'loop' and increasing automation, we were able to deploy fairly sophisticated analytics that improved over time and were used by the campaign.
GPS: What were the most popular analytic/data mining tools of your team? Did you use Hadoop and cloud computing?
RG: We used a combination of commercial, open-source, and internally built tools. Our analytics engineering team developed a common data platform which was augmented by custom tools on top for specialized applications. Hadoop was part of it, and so was cloud computing (as well as more traditional physical servers) but we also used a lot of R as well as commercial tools such as KXEN.
GPS: What data mining methods you found most useful - decision trees, naive bayes, SVM, neural nets, clustering, social network analysis, ...?
RG: We experimented with a variety of approaches and used different methods for different kinds of models - regression, SVMs, graph analysis, decision trees. The key, as usual, was to balance effectiveness, efficiency, speed, and ease of explainability.
GPS: How did you use facebook and other social networks as part of modeling?
RG: We used facebook for a few different purposes:
- We used facebook to reach young voters who were hard to reach using traditional channels such as phone, direct mail, and door-to-door canvassing.
- We built models using data from users who authorized our facebook app that allowed us to ask our supporters to contact their friends for specific reasons (voter registration, volunteering, going to vote, etc.). Our hypothesis was that getting their friends to ask them was more effective than us asking them directly by broadcasting on our facebook page.
- We also used facebook to determine people's interest and send them messages that were relevant to them and hence increase their likelihood of taking action.
GPS: What were the biggest differences of campaign analytics from your previous work at Accenture?
RG: At a very technical level, the work wasn't very different. At every other level, it was completely different. I had spent the past 10 years working in an R&D lab. This time, there was a very specific deadline after which nothing mattered. There was no room for "we can make it better in the next version". We only had 18 months to recruit a team, come up with ideas, get our data infrastructure right, build some prototypes, do a lot of testing, and get things out in production.
Some of the work that was done was certainly new R&D. Other times, we had to just get things done that weren't terribly interesting but critical for winning. Having an excellent and dedicated analytics team really helped us achieve all of that.
GPS: Now that the campaign is over, what are your plans?
RG: Now that I've had a chance to catch up on sleep and reconnect with people who used to be my friends, I'm trying to figure that out. I'm interested in helping non profits and socially progressive organizations do more with their data and make better data-driven decisions. I'm also interested in using machine learning and data mining for the social sciences as well as public policy. I think I still need some time to think through different options and decide what's next.
GPS: What was a good book that you read recently and enjoyed?
RG: There really wasn't much time for reading during the campaign - in fact, it was one of the things i missed the most during the campaign (other than sleep, sunlight, traveling, interacting with people outside the office). Now that I am reading again, I started reading Robert Caro's latest installment in his Lyndon Johnson biography series, The Passage of Power. If you haven't heard of these books, you should certainly check them out. Caro has spent a large part of his life researching Lyndon Johnson and the books are a treat to read.
GPS: There is a lot of hype now around "Big Data". Should we expect a Big Data bubble soon or is Big Data still in the growth stage?
RG: I'm always skeptical of hype, especially when it's around an area that I'm part of. You always fear that hype will lead to unrealistic expectations which will lead to disappointment, and then a slow death. That being said, I think the technological and business trends in the past several years have made it possible for data analytics to make a significant impact in most organizations.
GPS: "Data Scientist" has been declared the sexiest job of the 21 century in Harvard Business Review. Do you agree and what is your advice to aspiring data scientists?
RG: think the most important thing 'data scientists' can do is get the science part right. It's easy to be a data hacker, and you do see some immediate benefits just playing around with data, but unless you get the science part right, you can't make sustainable impact.
My advice for aspiring data scientists would be to understand the "why" behind the tools they're using, and to understand not only what they're capable but what their limitations are as well. Second, I would advise them to focus on not only the technical aspect of their work but also in communicating its capabilities and more impotantly, its impact. The more you communicate, the better prepared your clients/users will be to use your work in the most effective way.
Brief Bio: Rayid Ghani was the Chief Scientist at Obama for America 2012 campaign focusing on analytics, technology, and data. His work focused on improving different functions of the campaign including fundraising, volunteer, and voter mobilization using analytics, social media, and machine learning. Before joining the campaign, Rayid was a Senior Research Scientist and Director of Analytics research at Accenture Labs where he led a technology research team focused on applied R&D in analytics, machine learning, and data mining for large-scale & emerging business problems in various industries including healthcare, retail & CPG, manufacturing, intelligence, and financial services. In addition, Rayid serves as an adviser to several start-ups in Analytics, is an active organizer of and participant in academic and industry analytics conferences, and publishes regularly in machine learning and data mining conferences and journals.