KDnuggets Home » News » 2015 » Feb » Opinions, Interviews, Reports » Interview: Nicholas Marko, Geisinger on the Groundwork for Big Data Success ( 15:n07 )

Interview: Nicholas Marko, Geisinger on the Groundwork for Big Data Success


We discuss Big Data & Analytics at Geisinger Health System, CDO challenges, and impact of Big Data on decision making at executive level.



nicholas-markoNicholas Marko, MD is Chief Data Officer for Geisinger Health System in Danville, PA. He is a data scientist with expertise in predictive analytics, organizational data strategy, data-driven collaboration, and data science consulting. He heads the Department of Data Science & Engineering in Geisinger’s Division of Applied Research and Clinical Informatics (DARCI) and co-directs the High Performance Computing Center in the organization’s Institute for Advanced Applications (IAA). His specific academic interests include integration of heterogeneous data sources and application of advanced mathematical methods and modeling strategies to generate measurable value for organizations in the healthcare sector and beyond.

Dr. Marko is also a practicing neurosurgeon and serves as Geisinger Medical Center’s Director of Neurosurgical Oncology. His clinical practice focuses on surgical management of patients with malignant brain and spine tumors.

Here is my interview with him:

Anmol Rajpurohit: Q1. How does Geisinger Health System leverage Big Data & Analytics?

geisinger-logoNicholas Marko: This is a broad question, and one that I could talk about for hours! Let me give you a quick overview. The Geisinger Health System has had an electronic medical record in place since the late 90s, so we have about 20 years worth of comprehensive clinical data available for our very stable patient population of approximately 3 million patients. Geisinger also owns an insurance provider, the Geisinger Health Plan, through which many of our patients are insured. This creates a unique opportunity to study both clinical and claims / administrative data, and we are always looking for ways to capitalize on this opportunity to help improve the quality and efficiency of care delivery. Add to that another 19 primary data sources (operational data, billing data, etc), and you have quite an array of information available for us to use to these ends.

In that context, Geisinger is always trying to improve the tool set that we apply to these analytic tasks. We have had a dedicated EDW in place for about 8 years designed to make data available to clinicians, investigators, and operational personnel. As more data sources have come online, as the variability of data types and structures has evolved, and as things like unstructured text data, genomics data, and imaging data have entered the mix we have continued to move towards a true “big data” stack for data management and analytics. Our current project in this domain is an aggressive effort to build out an EDW that uses the Apache big data stack (Hadoop, etc). I anticipate that within one year the bulk of our analytics will be coming from that environment, although our relational database structure will still be very much in the mix.

Our analytics are varied and cover basically all parts of the spectrum. Obviously we support a lot of routine business needs, including thousands of reporting and dashboarding functions that are executed regularly. The more exciting part lies beyond that, where we are working with newer data types and more modern strategies to try to extract the most value from our data. healthcare-analyticsWe have a comprehensive effort in imaging analytics, and our genomics analytic system is benefiting from a new agreement that is helping us generate sequence data on 100,000 patients over the next 2 years. We are also using more sensor data / IoT type information, including RFID and patient tracking tools, and analyzing this data for predictive purposes (rather than just routine operations) is an active interest of ours. I personally head our Data Science Department, where we are mixing research with application as we study how the structure and fundamental nature of various types of healthcare data affect downstream predictive analytics, and we keep a very practical component of this in play by trying to readily translate our most interesting results into immediate clinical application.

We also have a dedicated high performance computing environment that we use for the genomics work, the imaging analytics, and some of the more advanced data science applications. This supports a lot of our machine learning work as well as some of the more computationally intensive approaches to knowledge discovery (graph analytics / topological analysis, etc). We are also working to hybridize this HPC environment with our evolving big data stack, because I believe that this type of hybrid computing environment will give us the best of both worlds. So there is a lot going on with data and analytics here at Geisinger, and we've got our fingers in just about everything that’s out there at the moment trying to find the best ways to connect this with the world of patient care.

AR: Q2. In the CDO role, what are the problems that you typically work on?

cdoNM: Again, the question is broad and the position continues to evolve. One big part of my mission is to get data in the hands of our end-users who need it to do their operational work and also to be innovative.

In any large organization there can be process-related issues, administrative practices, etc. that can slow this process down, and I see one of my most important roles as working to streamline this. It’s all about being an advocate for the data and for the people who need it.

Another big part is working closely with our IT folks to help ensure that our infrastructure is the best that it can be from the data use perspective. IT traditionally focuses a lot on the technical side of infrastructure, and they do that part very well. But as we move toward a truly information-driven ecosystem it is also important to bring an end-user and patient-centric perspective to those discussions and decisions. The right architecture gets data to the point of use quickly and efficiently, and we recognize that designing these architectures is not solely an IT function any more.

strategyThe third major part of my role is enterprise data strategy. I chair our EDS Steering committee and work with leadership across our organization to help set a course for how the system conceptualizes, works with, and manages its data.

This is critically important in a data-driven culture, because not having a strategy for your data can be as bad as not having a strategy for business development, etc.

Geisinger has recognized the value and the importance of its data for a long time, and developing, executing, and maintaining a strong cross-institutional effort in data strategy is a critical part of that commitment.

AR: Q3. In the last few years, how has the technological advancement and increasing popularity of Big Data impacted the executive decision making?

decision-makingNM: As in all institutions, this varies a lot with the individual executives and their preferences for understanding and consuming data. My part of this equation is to make sure that the best quality, most comprehensive data is always readily available to institutional leaders who need to use this data to make decisions. This has an architectural component as well as an information access and availability component. Our efforts in data-driven executive decision-making are helped by the fact that data has been a part of our culture for nearly 20 years. Our executive leadership is used to asking for the data and to making it an important part of their strategic process, in general.

Second part of the interview

Related: