Interview: Piero Ferrante, BCBS on Why Healthcare is Rich in Data but Poor in Information

We discuss role of analytics in healthcare payer firms, major challenges in leveraging healthcare data, shift to value-based payments, personal motivation towards analytics, career advice and more.

Piero FerrantePiero Ferrante is a practicing data scientist with a background in healthcare and applied machine learning. He holds a BS in Finance and Management Information Systems from the University of Delaware, a MS in Predictive Analytics from Northwestern University, and is a professor of Text Analytics at Rockhurst University. Piero is currently the Manager of Analytics & Insights at Blue Cross and Blue Shield of Kansas City where he has oversight over cross-organizational analytical initiatives. He also helps organize a local Data Science Meetup in Kansas City with over 190 members (

Anmol Rajpurohit Q1. What role does Analytics play in Healthcare Payer firms? What are the typical insights that the business is looking for?

BCBS Kansas CityPiero Ferrante: Right now there’s a lot of uncertainty on the payer side, especially with the advent of ACA. With slimmer margins and greater uncertainty comes the need to evolve. In this sense, evolution means more widespread adoption of analytics and data-driven decision making. As far as the insights are concerned, it really runs the gamut from more conventional applications involving targeted marketing and cross-sell/up-sell opportunities to developing clinical classification algorithms that can be used to identify undiagnosed conditions and things like Monte Carlo simulations that will help us understand a wide range of scenarios for different books of business.

AR: Q2. Healthcare is often referred to as a sector rich in data and weak in information. Do you agree? What are the biggest challenges in leveraging healthcare data?

Healthcare Data ChallengesPF: That’s fair to say. I define analytics as the transformation of raw data into information and insights with the intent of facilitating better decision support. With that in mind, I don’t think there’s enough systematic transformation (or data mining) taking place and decisions might still be made based on gut instincts and pattern recognition. The fact that the healthcare industry hasn’t historically been as technology-driven as the financial or the retail industries doesn’t help, but that is beginning to change.

One of the biggest challenges boils down to thinking of data as an asset as opposed to a liability.

As technology and security evolve, we should be challenging ourselves to evolve policies and procedures as well. Of course another big challenge is the sharing of data between various players in the healthcare arena such as payers, EMR providers and hospitals, pharmaceutical companies, and other research-based organizations.

AR: Q3. One of the most alarming problems with current healthcare system is the rising medical costs. How can Big Data play a role in solving this problem? Will Big Data promote the shift from service-based payments to value-based payments?

Rising Medical CostsPF: Depending on how you define Big Data, it may or may not help solve the problem. If you define it narrowly as having copious amounts of structured or unstructured data that can typically be described by a handful of words that start with the letter V, then I’d say the answer is no. On the other hand, if you define Big Data more generally as a movement geared towards making more thoughtful use of all sorts of data by leveraging technology like the Hadoop ecosystem and various machine learning libraries in a more meaningful way, then I’d be inclined to agree.

A lot of education and critical thinking is still needed on both the payer side and the provider side before things can change radically; however, the right types of questions around the cost and quality of care are being asked now. The Patient Centered Medical Home (PCMH) model holds a lot of promise.

AR: Q4. What are some of the most remarkable use cases of text mining in healthcare? Are there any unique challenges in dealing with healthcare text data such as clinical notes?

PF: Although still in their infancy, a few compelling use cases come to mind. Text mining clinicians’ progress notes and customer service calls for topics and sentiment are two solid examples. From a behavioral health perspective, it’s not a stretch to imagine how much more can be learned about a patient with these types of data points. At the end of the day, it all comes down to enabling a higher standard of care. More conventional use cases such as mining customer service calls for sentiment certainly have their place too.

AR: Q5. Tell us about your work on text mining corporate earning calls for sentiment analysis and making predictions about a given stock's future performance. What approach did you took for your research? What did you observe? What other applications do you see for your research findings?

PF: I initially sought to make sentiment analysis more digestible by measuring positive, negative, and neutral sentiment where words expressing modality can have a multiplicative effect and subsequently building favorability indexes that can be referenced through various individual-based, role-based, and topic-based hierarchies. Given a structured corpus of calls,Stock Prediction one could quickly gauge sentiment for a specific individual, external analyst, internal executive (by title/role), industry, or call topic with relative ease. From there, you can train models to make predictions or extract information to enrich other models. For example, if a certain analyst is always overwhelmingly negative on calls then you can ding him or her by weighting the negative input accordingly. Conversely, if an executive is consistently, unjustifiably optimistic then you could weight that as well. Right now everything is done in Pandas (I prototyped in the PyData ecosystem) so I’ll be working on formalizing the data model in the near future.

AR: Q6. What motivated you towards Analytics? What are the most underrated challenges of working on Text Mining and Sentiment Analysis?

PF: Right out of college I was a finance guy and fortunately I took an advisor’s advice (before data science was a thing) and picked up an MIS (Management Information Systems) Motivationminor as well because 2008 wasn’t the best time to hit the ground running in New York City with a finance degree. In any case, I got involved in the healthcare industry as an operations analyst. I quickly realized that data was EVERYTHING and that I could automate away the majority of my work which led me further down the paths of databases and programming. From there on out, the problems that I was addressing required a deeper understanding of statistics and eventually other forms of predictive modeling and machine learning became necessary. So I went back to school.

Being of a mathematical mindset that’s laced with a hefty dose of pragmatism has served me well. I think about deriving insights like an engineer thinks about building a bridge or a doctor thinks about performing surgery; given my skill set, this is how I’m best suited to make contributions and create value. Just like in most exercises involving some form of predictive modeling, text mining requires a lot of data preparation that will essentially make or break your models and/or insights.

AR: Q7. Based on your experience of being involved with academia as well as industry, what advice would you give to Data Science students aspiring a long term career in Analytics?

Career AdvicePF: First, you absolutely need to love what you do because if you don’t then the data preparation and feature engineering aspects of it will wear thin. Another valuable piece of advice that was given to me is the idea of not letting perfect be the enemy of good. Sometimes going to market with something that’s good can be better than letting an opportunity pass while perfecting something that would have been great. Last, I’d stress the importance of ongoing education and deepening your analytical toolbox from a software, theory, and application perspective.

AR: Q8. If you ran out of your to-do list on a week day, what will you do? Are there any good books that you have been reading lately and would like to recommend?

PF: I have a lot of pet projects that I’m always working on and I’m trying to be more involved with a Big Brothers Big Sistershandful of not-for-profit and charitable organizations by volunteering. I recently helped Big Brothers Big Sisters of Greater Kansas City optimize its donation bin placement around the city using a very limited dataset that I enriched with Census data and built some models on. I also try to keep up with data science related news on Twitter, Kaggle, FastML, The Practical Quant, Flowing Data (for visualization inspiration), Scikit-Learn/R/Spark/H2O/Hadoop releases, white papers, and of course KDnuggets!