Challenges in Machine Learning for Trust

With an explosive growth in the number of transactions, detecting fraud cannot be done manually and Machine Learning-based methods are required. We examine what are the main challenges for using Machine Learning for Trust.



By Chirag Mahapatra, Trooly

Trust is at the bedrock of our human social system. We need it for trade, politics, human interactions and for many more things. Hillary Clinton lost the 2016 US elections1 for the lack of it. Another increasing trend is that more and more companies are building machine learning models for trust. Here are some examples of automated trust computations:

  • Finance companies calculate credit scores for loan/credit card decisions.
  • Peer to Peer companies make decisions on whom to onboard to their platform.
  • Tech companies detect whether content on their site is fake.
  • Law enforcement agencies use models to identify potential criminal activity.

Yet as trust is becoming increasingly crucial, it is becoming harder to build systems for automated trust decisions.

Why do we need automated trust?

The number of decisions are far too many for any enterprise to tackle manually. Uber had 40 million monthly riders worldwide as of October 20162. Over 174 million Americans have a credit card3. Also, we would like to make decisions at a more granular level. E.g. We not only want to check if the credit card user is trustworthy, but we want to ensure all his transactions are safe. We would also like to make different types of checks: fraud, prostitution, solvency etc. No single human can be an expert in all these fields. With the conservative estimate of 174 million making an average one transaction a day and checks over 3 verticals, we would need 522 million checks a day. Assuming one person can do a check in one minute and 8 hour working days, we would need over 100 million people (522 million / 8 working hours per day) to do manual checks. Just for one industry. It is suffice to say we need the help of automated agents.

Why is it hard?

Machine Learning and Deep Learning have made tremendous progress in the last decade. There have been significant advances in computer vision, natural language processing and speech recognition which have led to products like self driving cars and virtual assistants. One of the reasons for these advancements is the large number of datasets available. However, it is difficult to create large datasets for trust and risk. When two laypeople look at an image which has a cat, they would unambiguously conclude that there is a cat in the image. The same is not the case for trust, where it will take an expert to tell if a user’s details are fraudulent. Also in some cases, experts might disagree on the labels.

A second challenge is lack of open source datasets and models to build upon. There are many datasets for image recognition like MNIST, CIFAR, Imagenet etc4. Numerous researchers have worked on these datasets and build upon each others’ work to build the latest state of the art. There is a lack of such datasets in trust.

Impact on business

A key challenge in building models for trust and risk is to understand the impact it will have on business. For example, requiring the user to provide more evidence about their credentials will probably reduce incidents. However, it will also deteriorate the user experience which might lead to reduction in usage of the platform. Finding the appropriate trade off becomes the key.

Regulatory aspects

Finally, depending on the use case, there are many regulatory hurdles as to what data can be used for models. For example, it is illegal to use gender, race based information in models (and rightly so!). Also there is a requirement for the models to be interpretable. This means that a human should be able to explain why a model is making a given decision. This might limit the use of certain algorithms because it is hard to interpret them.

Conclusion

Machine Learning for trust is definitely hard. Yet it is one of the most exciting fields to work on. There is definitely a thrill when your algorithm is able to predict a ‘bad’ event. As bad actors improve their methods, it is of utmost importance to keep innovating these models in a data driven and safe way. While a lot has been done in the last few years, I am keen on seeing how this field develops in the next 20 years.

References

  1. Why Hillary Clinton lost the election: the economy, trust and a weak messagehttps://www.theguardian.com/us-news/2016/nov/09/hillary-clinton-election-president-loss
  2. Uber now has 40 million monthly riders worldwide http://fortune.com/2016/10/20/uber-app-riders/
  3. Credit card ownership statistics http://www.creditcards.com/credit-card-news/ownership-statistics.php
  4. http://deeplearning.net/datasets/
Disclaimer: The views here are solely my own and do not express the views or opinions of my employer.

 

Bio: Chirag Mahapatra is a software engineer at Trooly, where he works on building Machine Learning models for Trust and Risk systems.

Related: