Interview: Alessandro Gagliardi, Glassdoor on the Indispensable Skills for Data Scientists

We discuss Analytics at Glassdoor, important lessons, major factors affecting job satisfaction, challenges of working on Twitter Data, indispensable components of Data Science education.

alessandro-gagliardiAlessandro Gagliardi is a Senior Data Scientist at Glassdoor and instructor of Data Science at General Assembly in San Francisco. He also mentors at SlideRule and has been helping develop the Data Science curriculum at GalvanizeU. At Glassdoor, using big data and machine learning he analyzes user generated content such as salaries and employer reviews. Prior to that, he worked for Path, analyzing terabytes of customer activity logs providing business insights for product development.

Alessandro received his B.A. in Computer Science from UCSC and pursued a Ph.D. in Behavioral and Neural Science at Rutgers until moving back to California in 2010. He taught neuroscience and psychology at USF and CIIS before returning to industry as a Data Scientist in 2011. Since then, he has been working on bringing his knowledge of biological computation to the field of data science all while training the next generation of Data Scientists.

Update (April 6, 2014): Alessandro is no longer working at Glassdoor. Currently, he is Lead Professor of Data Science for GalvanizeU's Master of Engineering in Big Data.

Here is my interview with him:

Anmol Rajpurohit: Q1. What does Glassdoor do? How important is Analytics at Glassdoor?

glassdoor_logoAlessandro Gagliardi: Glassdoor helps people find jobs and companies they'll love. We do this by giving job seekers access to every open job listing and pairing that with reviews, salaries, benefits & interviews information so they can not only find a job that matches their skills but also pays well and is a great cultural fit for them. We are global - we have users and content in 190 countries and are actively expanding

Analytics is very core to what we do. In fact, Glassdoor's primary product offering is data in the form of reviews and salaries. We use analytics to support internal business decisions, like A/B testing, strategy & pricing, etc.; to drive data products - such as Reviews Highlights or Salary Medians and for Employer Insights - such as how their ratings vary by job function and office locations.

AR: Q2. What are the typical problems that the Data Science team at Glassdoor works on?

AG: It varies a lot. Our projects range all over from detecting fraudulent reviews, detecting predictors of salaries, supporting A/B tests, to reporting on the health of the company. There really is no "typical problem" which is part of what makes being a data scientist at Glassdoor so interesting!

AR: Q3. What were the key lessons that you learned from the experience of extrapolating from the known to the unknown (for example, the cases with insufficient salary data)?


One lesson might be this: don't let the perfect be the enemy of the good. Your predictions will never be perfect and there will always be ways to improve it, by drawing upon more outside data, including other factors, and so on.

It's a lot easier to come up with ways in which to improve a model than it is to actually execute on those improvements which means that, unchecked, a project like that can grow out of control. It's important to check in, set expectations that this will be continuous and gradual process, congratulate yourself for the progress you've made and ship early.

AR: Q4. Besides salary, what other factors play a crucial role in job satisfaction?

job_satisfactionAG: So many factors, really. Besides compensation, employees care most about Career Opportunities, Work Life Balance, Commute, Company Culture & Benefits Package. In fact, one of the Data Science projects I worked on had to do with extracting emotional information from employer reviews. I can say that I saw a lot of nuance in how people described their work environment which had nothing to do with salary. One interesting example of that was how often I came across the word "love" in reviews for a company that had a very low rating on our site. This seemed paradoxical until I looked at what the employees were actually saying. It was a service organization and they all said that they loved the people they were serving, but the organizational structure was just intolerable. I doubt increasing salary would have helped much in a case like that.

AR: Q5. A lot of your research is based on social conversations data from Twitter. What are the most underrated challenges of working with Twitter data?

twitter-dataAG: Working with natural language is very challenging to begin with. Twitter presents an added challenge because of how ephemeral it is. It's a moving target. As an example, in the course I teach at General Assembly, I have my students sample the Twitter feed and then produce some statistics on it. Even though all the students are sampling the same feed at roughly the same time, the results are often hugely divergent. Trying to make inferences from the sample to the population with Twitter data is particularly dangerous.

AR: Q6. Based on your experience as an instructor at General Assembly, what do you consider as the indispensable components of a good Data Science curriculum?

general-assembly_logoAG: I see many data science curricula putting a lot of focus on machine learning. Machine learning is certainly important, but it is far from the end-all-be-all of data science. Big data is, of course, a big topic (and growing!) I think anyone calling themselves a data scientist needs to have at least a passing familiarity with big data techniques. That said, big data can also suffer from too much hype. Sometimes small data is all you need to get the job done.

Other skills I also see neglected a lot are SQL and statistics. SQL is not used much in academia, so a lot of academics-turned-data scientists can find themselves disoriented with tasks that their colleagues would find elementary. But relational databases are here to stay and being able to use them effectively is definitely a required skill for anyone who would call themselves a data scientist. Stats are also important, and it goes beyond simply knowing how to run a chi-square test. A data scientist needs to cultivate an intuition about probability and statistics so they know when not to believe it when their computer tells them something is significant below the p <.001 threshold.

Second part of the interview will be published soon.