DataScience.com Releases Python Package for Interpreting the Decision-Making Processes of Predictive Models
DataScience.com new Python library, Skater, uses a combination of model interpretation algorithms to identify how models leverage data to make predictions.
DataScience.com has released a beta version of Skater, its new Python library for interpreting predictive models. Skater uses a combination of algorithms to explain the relationships between the data that go into a model and the predictions it makes, allowing users to assess a model’s performance and identify key features.
As machine learning becomes increasingly popular in business applications — such as scoring the creditworthiness of loan applicants, recommending relevant content to viewers, and predicting the amount of money online shoppers will spend — interpretation is also becoming an indispensable part of the model-building process. Skater provides a common framework for describing predictive models regardless of the algorithm used to build them, giving data science practitioners the freedom to use the technique of their choice without worrying about its complexity.
“In many cases, a data scientist will use simple modeling techniques like linear regression or decision trees because the resulting model is easy to interpret,” said DataScience.com Chief Strategy Officer William Merchan. “In effect, he or she is sacrificing performance for interpretability; for example, neural networks or ensembles are harder to explain but produce highly accurate predictions. Skater aims to eliminate this compromise.”
Skater features model-agnostic partial dependence plots, a type of visualization that describes the modeled relationship between a predictor and a target, and variable importance, a measure of the degree to which features drive predictions. It also improves upon existing methods for model interpretation like Local Interpretable Model-Agnostic Explanations (LIME). Skater allows these methods to be applied to any machine learning model — from ensembles to neural nets — whether it is available locally or deployed as an API.
With Skater, data science practitioners can:
- Evaluate the behavior of a model on a complete dataset or on a single prediction: Skater allows for model interpretation on both the global and local level by leveraging and improving upon a combination of existing techniques including partial dependence plots, relative variable importance, and LIME.
- Identify latent feature interactions and build domain knowledge: Practitioners can use Skater to understand how features relate to one another in specific use cases — such as how a credit risk model uses a bank customer’s credit history, checking account status, or number of existing credit lines to approve or deny his or her application for a credit card — and then use that information to inform future analyses.
- Measure how a model’s performance changes once it is deployed in a production environment: Skater supports interpretation of deployed models, giving practitioners the opportunity to measure how feature interactions change across different model versions.
“One of the key features of DataScience.com’s enterprise data science platform is the ability to deploy models behind a REST API to make them instantly available for integration with dashboards or real-time applications,” Merchan added. “Skater is helping us take that one step further by making it possible to explain the complicated models deployed in our platform — or anywhere — in a way that is understandable to both data science practitioners and, ultimately, non-technical stakeholders.”
The Skater package is available through GitHub and can be easily installed from PyPI using pip.
For more information, visit www.datascience.com.