Interview: Conal Sathi, Data Scientist, Slice on Creating Value from Mining Shoppers’ e-Receipts
We discuss the relevance of "Purchase Graph", Slice platform, analytical insights from mining all activity around a customer's purchase, experimentation strategy, experience of working as a data scientist and more.
Here is my interview with him:
Anmol Rajpurohit: Q1. Can you please explain the term "Purchase Graph"? Why is it so relevant for online retail companies?
Conal Sathi: In this age, graphs are becoming quite popular for exploring a domain. Google popularized the ‘information graph,’
This is crucial for not just online retail companies, but for all kinds of companies. If you understand the affinities between products, you have opportunities to advertise, cross-sell and up-sell. I’m sure most users are not fully aware of the entire catalog from all retail companies, so using the purchase graph, you can hint at what purchases the users might be interested in and what they’re like (and in an automated method!).
AR: Q2. What all purchase attributes play a role in the Purchase Graph - total bill amount, number of items in the cart, returned items, etc. ? What kind of insights can a user get by analyzing one's purchase graph built on Slice platform?
CS: A Slice user can learn a lot about their own shopping habits - how much they spend, what they buy the most of--and we are currently working on new features
The way we really bring our data to life, though, is through our partners, who can tap into the Slice API to create new features and experiences. Slice captures all activity around a consumer’s purchase--what they bought, how much they paid, how they paid, where they live and where they shipped it-- over time, and how these behaviors change. We enable our partners--retailers, web publishers and service providers--to harness this valuable data to create personalized customer experiences, such as product recommendations and targeting offers based on a visitor’s unique shopping history. And we’re actively working on new products to bring even more insights to light, which we plan to launch in the coming months. Stay tuned!
AR: Q3. Is there any plan in future to integrate information about offline shopping (for example, users scanning and uploading their paper receipts), in order to make the analytics more valuable?
AR: Q4. A key component of all e-commerce analytics is experimentation such as A/B testing. What kind of experimentation does Slice do in order to improve it's understanding of users' e-commerce activities? What are some of the best insights you have achieved from such experimentation?
CS: In the machine learning team, we build models for prediction and categorizing data. As we iterate on these models, we establish metrics from the start, so as we experiment with new features and algorithms, we learn which ideas help and which ideas don’t help.
It’s interesting to see that sometimes ideas that we intuitively think should improve the models are not always correlated with ideas that actually improve the model. This is why metrics are so more important. That said, you cannot just blindly rely on metrics. Sometimes you need to look deep into the data and understand what’s going on as you make changes to the algorithms.
AR: Q5. Do you observe any hesitation amongst users towards Slice because of privacy concerns? Are users comfortable sharing their email inbox, given the immensely private nature of information stored in emails?
CS: We have not, and here’s why -- we are clear with our customers that our technology only identifies and analyzes the receipts in their inboxes--and nothing else--and that we never, ever release their personally-identifiable information, full stop.
AR: Q6. What motivated you to work in data analytics? What aspects of your job do you like the most and what are the aspects that you do not enjoy much?
What I like most about my job is dealing with such a unique, high-definition data set – a longitudinal, cross merchant purchase graph as well as working with smart people in a small startup that moves quickly.
What I don’t enjoy? Picking restaurants to eat on University Ave in Palo Alto! There are just too many great options. Also, I do not enjoy when my coworkers (rarely) beat me in ping pong--which is pretty serious at Slice! Seriously speaking though, sometimes data sets can be very noisy and messy, especially with the data set comes from so many sources. This both makes the problem fascinating and sometimes frustrating as it makes it harder to build algorithms and training sets that can generalize. But that is what makes what we are doing so valuable and so challenging.
AR: Q7. In your Data Science career so far, what is the best advice that you have got? Why is it so important?
CS:
The best advice I got was that before iterating on a machine learning engine, make sure to establish metrics. What should the engine do and what is most important? Once you figure that out, quantify it. This will allow you to iterate quickly and help you understand whether a new feature or algorithmic change has improved the engine or not.
AR: Q8. What was the last book that you read and liked? What do you like to do when you are not working?
CS: The last book I read was The Tipping Point by Malcolm Gladwell. It was fascinating to read a social scientist discuss how epidemics and ideas spread. While reading, I couldn’t help but turn his problem into a graph with nodes and edges. While not working, I like to sing and play the piano, as well as hiking and biking.
Related: