Interview: Taylor Phillips, Square on Why Finance Needs Machine Learning and Data Science
Tags: Data Science, Finance, Fraud Detection, Machine Learning, Real-time, Recommendation, Skills, Square, Taylor Phillips
We discuss the role of data science at Square, common machine learning use cases, transition to real-time architecture, major challenges, expectations from data science, key qualities for data scientists, and more.

Here is my interview with him:
Anmol Rajpurohit Q1. What does Square do? What role does Data Science play in your firms strategy and day-to-day operations?
Taylor Phillips:

AR: Q2. What are the most common Machine Learning (ML) use cases at Square?
TP: Risk, fraud and underwriting are the bread-and-butter use cases for ML at Square.

AR: Q3. What was the motivation behind transitioning from batch, offline ML pipeline to highly-available, real-time architecture? What was the key learning from this experience?
TP: This isn't a decision to be taken lightly. At our smaller scale, batch offline worked great - it's a

The key learning experience is that real-time architecture to move money is hard. You need engineers who can build reliable systems, data scientists to build innovative models and the magical people who span both to glue everything together and make sure nothing is lost in translation.
AR: Q4. In your experience, what are the biggest challenges on Machine Learning and other Data Science projects?

In practice, there’s a lot of contradicting goals with data:
- Engineers working with data care about a lot of different things (e.g. data immutability and data temporality) than the engineers working on shipping products that generate the data.
- Easy access to the data is essential, but as data gets larger, the need for different types of data storage arrives (e.g. cold and hot storage), which introduces derivative challenges like new technologies and maintaining data consistency across stores.
- When your data scales up, so do your model training and evaluation speeds. Keeping those times as low as possible is important to enable data scientists to try out ideas and iterate quickly.
- Cutting edge data tools offer new features, but old tools are reliable and time-tested.
- Ad-hoc data exploration requires different methodologies from automating the stuff you know you need. The former wants speed and random access while the latter wants repeatability and stability - very different engineering optimization problems.
AR: Q5. What are your major recommendations to data scientists working in the financial domain on problems such as loss prevention and risk management?
TP: Focusing on driving down the same error metric for an extended period of time is going to yield diminishing returns. Sure, at huge scale tweaking a button color can make

AR: Q6. How do you think the expectations from Data Science have evolved over time? Where do you see them headed in the future?
TP: Data Science is a loaded term and can mean a lot of different things depending on the context. Off the top of my head, I’d break them into a few different roles:
- Data Science Research - Explores data and prototypes new features and models to improve business metrics and provide new insights. They use tools like R, Matlab and Python.
- Data Engineer - Focuses on obtaining and maintaining the data in a variety of usable forms. They own the data pipelines (e.g. Kafka, Flume) and data storage (e.g. HDFS, MySQL).
- Data Science Engineer - Implements the features and models and makes them go live in production. These are the unicorns who bridge the gap between research and practice.
AR: Q7. What key qualities do you look for when interviewing for Data Science related positions on your team?
TP: The best hires are the people who can do the ML work and the engineering work, but that’s a lot to ask for. They can have a massive impact very quickly because they are able to single-handedly prototype new ideas and then do the engineering legwork to ship them to production.
Hiring pure R or Python hackers can be great, but ability to write real code and interface with engineers is essential. Likewise, hiring pure software engineers can be great, but basic skills in math, stats and ML go a long way.
AR: Q8. What are your favorite books on Data Science? What do you like to do when you are not working?
TP: My favorite place to learn new things related to data is Mike Bostock’s homepage. I had the privilege of sitting next to Mike at Square for a bit - his work is elegant, inspiring and often accessible to non-techies.
Outside of work, I love automating things I enjoy, like online poker and video games. Besides that, it’s great to get out in the sun and behave like humans used to!
Related:
Top Stories Past 30 Days
|
|