Interview: Xia Wang, AstraZeneca on Unraveling Patient Treatment Journey by NLP on Clinical Notes

We discuss Analytics at AstraZeneca, prominent use cases, how NLP helped understanding patient treatment journey in diabetes, data sources, insights, and more.

xia-wangXia Wang currently holds a principle scientist position at Biomedical and Health Informatics group within the AstraZeneca clinical development unit. Xia has long track record of successfully applying novel informatics solutions to support medicines development including predictive translational safety analyses, and leverage the real world evidence based analytics in clinical study design, epidemiology observational research, health economics and outcome, comparative effectiveness and marketing research.

Prior to stepping into the clinical domain, Xia was with the AstraZeneca the innovational medicines unit, focused on the areas of informatics and computational modeling to accelerate candidate drug identification and optimization in the early discovery phases. Xia holds a Ph.D. in computational chemistry and has extensive training in broad areas of Informatics.

Here is my interview with her:

Anmol Rajpurohit Q1. Can you describe your role at AstraZeneca? What are the prominent use cases of Analytics in Medicine Development?

astrazeneca-logoXia Wang: At AstraZeneca, I belong to a group known as the Advanced Analytics Center (AAC), which sits in our clinical research organization. The mission of AAC is to transform drug development decision making through applied data science. There are three skill groups in AAC including Informatics, Statistical Innovation and Scientific Computing Solutions.

My role as a Principal Informatician always starts with understanding the key challenges in the life cycle of the development of a medicine. This involves focusing on identifying the most relevant data and implementing the most appropriate analytical methodologies, in order to reveal the insights and make the decisions that can meet those challenges.

In my opinion, the most prominent use cases of Analytics here are those that show that new medicines are effective, safe and prescribed to the right patient. For example, the analytics of clinical trials data to obtain the approval of a new medicine, the comparative effectiveness analytics to compare benefit of existing treatment in clinical practice versus a newly approved medicine, and the analytics of patient populations to identify the common characteristics of a group of patients who can benefit the most from a certain medicine.

AR: Q2. What was the motivation behind the pilot work on utilizing Natural Language Processing (NLP) to explore the understanding of patient treatment journey in diabetes?

diabetesXW: At AstraZeneca, science and patients are at the heart of everything we do. Diabetes is one of the main therapeutic areas in our clinical research where we believe we can make the most difference. So from the scientific perspective we are very keen to gain in depth understanding about patient treatment journeys being observed in clinical practice, and how diabetic patients can benefit most from various treatment options. Such detailed information can help reveal the ultimate complexity and decision-making in treating diabetic patients. However it is not usually captured in the structured data elements of healthcare databases currently available to Life Sciences companies.

We believe the newly available and de-identified digital patient charts which contain clinical documentation directly from physicians are promising to help address these challenges. This pilot focused to address two fundamental questions: 1) whether we can identify the events of interest along the diabetic patient treatment journey from the clinical note sources and 2) the evaluation of NLP technology in retrieving this meaningful information.

AR: Q3. What were the key sources of data for your study? Were there any data sets that were desired but not accessible?

clinical-notesXW: This pilot study evaluated two databases with clinical notes available. The tertiary care center database evaluated initially as part of this pilot contained a small percentage of new onset diabetic cases. Further, the ambulatory care database from thousands of separate outpatient physicians was also evaluated to supplement the overall picture of diabetes care and decision making in treatments.

Ideally for this pilot we would like to have included the de-identified electronic health records (EHR) as a more structured data format that can be linked with the clinical documentation. However, the longitudinal coverage that such structured data provides was less desirable at the time this study was conducted.

AR: Q4. What were the key insights from this research? Any recommendations or future directions?

clinical-documentationXW: Through this pilot project, we were able to benchmark our understanding and validation of how NLP technology can be used to retrieve diabetic patient treatment journey from clinical documentation. We proved that the sequence of interested events and relative dates can be extracted efficiently into structured data tables with high accuracy, which were validated via chart review on a set of sample notes.

At the same time, the limitations of the type and extent of information captured in clinical notes was observed. So in the future the longitudinal coverage of patient life plus integration of structured data from clinical notes are essential to provide a comprehensive picture of treatment journey for diabetic patients.

Second part of the interview