KDnuggets Home » News » 2018 » Jan » Opinions, Interviews » Data Science vs Addiction: Estimating Opioid Abuse by Location ( 18:n05 )

Data Science vs Addiction: Estimating Opioid Abuse by Location

Data science can help find the optimal locations for drug treatment facilities, even in the face of major data challenges.

By Dan Putler, Alteryx.

As part of an Alteryx team developing an application to optimally locate opioid treatment facilities, we needed to know the number of individuals abusing opioids at a fine grained geographic level, data which is not available. However, there is sufficient data to predict this information in each census tract using a three-step approach we present below.

The Three-Step Approach

  1. Using survey data (in this case from the 2016 National Survey on Drug Use and Health) estimate a model that links the probability of abusing opioids to socioeconomic and demographic characteristics of each individual, which is available for small geographic areas.
  2. Develop estimates of the number of individuals that fall into each unique socioeconomic and demographic group in a census tract using census tract summary data and Public Usage Microdata Sample from the American Community Survey, and a statistical method known as iterative proportional fitting.
  3. Multiply the estimated number of individuals in each group in a census tract by the probability that an individual with that socioeconomic and demographic profile abuses opioids, and then sum these values across all the groups in a census tract.

Based on a gradient boosted model (trained using R’s gbm package), we find that socioeconomic and demographic factors are good indicators of the probability an individual abuses opioids. The important predictors (all of which had the expected effect), in order of importance, are:

  1. Race and ethnicity (non-Hispanic whites have the highest probability, while Asians have the lowest probability)
  2. Gender and age group (women and men age 65 and over have the lowest probability, while men age 26 to 34 have the highest)
  3. Employment status (those who are not working due to a disability have the highest probability, while those who are in school or a training program full-time have the lowest)
  4. Marital status (those who are married have the lowest probability, while those who are divorced or separated have the highest)
  5. Income (the probability decreases as income increases)
  6. Educational attainment (the probability decreases as educational attainment increases)

The relative importance plot for the variables is given below

Fig. 1: relative importance of predictors for opioid addiction

Examining Data Validity

How can we determine the validity of the three-step approach when we don’t know the actual truth? We do this by examining whether our estimates of opioid abuse are an effective predictor of a “downstream” measure, in this case mortality rates due to drug overdoses. Drug overdose death rates are available for many counties (census tracts estimates can be aggregated to provide county level estimates), and opioid use is the most important cause of drug overdose deaths. Across the counties the Pearson correlation between the two variables is 0.65, a value that is both high and highly statistically significant. Given this, we have strong confidence in our estimates based on the three-step approach.

A choropleth map for West Virginia, which gives the estimated percentage of adults who abuse opioid is show below, and provides the basis of our opioid facility location application.

West Virginia

Fig. 2: West Virginia: estimated percentage of adults who abuse opioid

The Optimization App

The optimization app is focused on three states: Indiana, Ohio, and West Virginia. These states all have comparatively high opioid overdose death rates, and are contiguous with one another.

To create the application, we can rely on an Alteryx feature known as a “Location Optimizer Macro”. In the app, a user is asked how many additional treatment facilities they would like to add to the existing set of treatment facilities, which combination of the three states (all three, any two of them, or only a single state) should be considered for locating the new treatment centers, and how the underlying map data should be displayed (the choices are to show the number of expected adults who abuse opioids at either the census tract or county level).

The app searches for the user provided number of locations that maximize the expected number of adults who abuse opioids that are within a ten-mile radius of the selected sites, and who are not currently within a ten-mile radius of any existing site. The ten-mile radius is somewhat arbitrary (it seemed reasonable to us since it would represent a fairly short travel time), and the app can easily be altered to make this a parameter that is under a user's control. The optimization method is heuristic in nature, relying on an evolutionary algorithm. For the application, we configure it to continue to look for a better solution until a stopping rule is met. The output of the app is shown in the figure below.

Fig. 2: West Virginia locations that maximize access to people likely abusing opioids.

Bio: Dr. Dan Putler (@AskDrDan) is the Chief Data Scientist at Alteryx, where he has been responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover many industry verticals, ranging from the performing arts to B2B financial services.