Silver BlogNew-Age Machine Learning Algorithms in Retail Lending

We review the application of new age Machine Learning algorithms for better Customer Analytics in Lending and Credit Risk Assessment.

By Jayesh Ametha

More than a decade back while joining a large US Credit Cards company, it was surprising to see that Predictive Analytics was limited to multivariate regression and logistic models. This was in contrast to previous stints at Start-Ups funded by NASA / NIST where a broader set of Machine Learning (ML) methods including SVMs, NNs, Random or Gradient Boosting Trees were regularly applied.


There were a number of good reasons for using the simpler models in Retail Lending. Firstly, Decision Frameworks were already in place that made input feature selection a relatively simpler exercise. For e.g., for Credit Decisioning, one could think in terms of 5Cs of Credit (Character, Capacity, Capital, Collateral, Conditions), and search for Data variables that catered to them. It wasn’t as hard as using Deep Learning to creating features from raw Images. Secondly, the relationship of the target variable with the inputs were not complex for e.g. Credit risk has a smooth inverse relationship with Income. One did not really need a Radial basis function to transform Income into a higher dimensional space, like one needed for SVM based Image classification. Thirdly, unlike today, Training and Deployment Platforms were not amenable to complex methods. Finally, a commonly stated reason was Model explainability (though experienced users of advanced ML models will find this debatable).

With time, the above-mentioned ML methods started getting explored as open source packages became common and Data coming in different forms. However, the primary value for a Business came from identifying new powerful Data that could significantly improve Customer level Decisions. While Alternate Data Sources will always be a focus area, there are certain specific business problems which can be handled much better with new ML Algorithms that have become prevalent only in the last few years. Here, we discuss three such algorithms focusing on their application in Retail Lending.


Sequence data from Bank Deposit, Loan or Credit Card transactions can be used to generate powerful insights and actions. Some example use-cases:

  • Credit Risk: Understanding how a Consumer’s Credit history and Transaction volume / profile has changed through time can help make better Credit decisions for new Applicants as well as existing Customers
  • Fraud Detection: Identifying specific sequence of Transactions could signal Fraud or Money Laundering and can be used as a Trigger for blocking credit access and conduct investigation
  • Churn Prediction: Understanding how Transactions volumes and profile have changed through time can help identify which Customers may be about to attrite, and take Retention steps
  • Product Up-Sell / X-Sell: Looking at Transactions sequence to assess if a Customer has had a Life-event or “Graduated” for a potential change in Products or Terms
  • Customer Service: Customer Interactions (Assisted or Bot-Driven) can be improved through models that remember and learn from past engagements

Common Statistical and ML algorithms are not well structured to handle this type of data. While Statisticians have traditionally created features (for e.g. different time window Averages) that try to capture some of the trend information, LSTM (Long Short-Term Memory) networks are a class of recurrent neural networks that are specifically built to learn from sequence data.

Matrix Factorization:

Recommender Systems have been popularized by its use in Retail (Amazon), Web Streaming (Netflix) and Knowledge Sharing (Quora). Many of its implementations use Matrix Factorization which is a traditional Linear Algebra formulation that was made feasible through faster computational capabilities. Following are couple of use-cases that apply to Retail Lending:

  • Next Best Offers: This it to determine what next to offer to an existing Customer. Traditionally, this has been solved by building a large set of response propensity models for each prospective product for each Customer segment. “Matrix Factorization” can help replace with a single elegant model especially when there are many products to choose from.
  • Missing Data: One of the key pre-processing steps before Modeling is to address missing values in a Dataset (although Tree-based algorithms do handle it as part of their learning itself). Matrix Factorization can be used to learn “look-alike” patterns from the overall dataset and fill up the missing values before applying non-tree ML algorithms.

Deep Learning:

It’s a no-brainer that Deep Learning has been the most visible new age Machine Learning algorithm developed in the last 5 years, with marked success in generating insights from Large Unstructured Datasets of Images, Audio and Text. Some example use-cases for Retail Lending:

  • Voice to Text: Off-the-shelf Deep Learning software can convert Customer’s Audio to Text, which can then be used with other ML methods for Automated Intelligent Customer service
  • Social Listening: Applying Deep Learning on Unstructured Text Data from Social feed and Customer Logs, can help in understanding the 4Cs of Company (Brand assessment, Product feedback), Competitor (Benchmarking, Strategy changes), Customer (Sentiment Analysis) and Climate (Market trends)
  • Customer Segmentation: Unsupervised Clustering is commonly used to segment and profile Customer Base to better understand and develop strategy for each segment. It’s training suffers from the “Curse of Dimensionality”. While methods like PCA are applied to solve for this, Deep Learning can be an alternative to create more advanced lower dimensional features

Modern end-to-end Big Data Platforms available today provide the computational power to train new age ML algorithms and streamline their deployment. Variable Importance measures like Partial Dependence and Distance to Decision Boundary, can help in Model Explainability. It is still important to use appropriate technique for a given analytical problem and avoid complexity. Model Robustness, Incremental Business value, Customer experience, Implementation and Governance should be considered paramount. Machine Learning and the deluge of Alternate Data Sources has certainly paved way for more exciting “Modeling” times in Retail Lending.

Bio: Jayesh Ametha is Retail Banking Professional with 15+ years in Business Strategy, Credit Risk and Advanced Analytics.