AI for Fun & Profit: Using the new Genie Cognitive Computing Platform for P2P Lending
This tutorials uses the recently-released Genie (an acronym for General Evolving Networked Intelligence Engine) platform to learn from P2P (peer-to-peer) loan data. Experts and non-experts alike can leverage Genie to analyze Big Data, recognize objects, events, and patterns, and more.
Data Analysis Platform
Let's jump into finding key indicators in this data set. Genie's "Information Analyzer" link provides automation that does most of the hard work for us.
Select the target primitives we want processed, and click "Process Kbs". This will process all the knowledge bases ("KBs") associated with that primitive's Cognitive Processor:
When the progress bar completes, the reports will appear in the dropdown menu under "Available Reports". Sticking with "P1", let's review its analysis.
The report contains 4 sections. The first section, titled "P1 Vitals" contains an "Information Score" and an "Information Profile":
The "Information Score" is a quantification of the amount of useful or actionable information in the knowledge bases. (Note that IA's proprietary actional information measure is not the often used Shannon's information value, which is information entropy and is useful only for comparing how much stuff you can push down different communication channels.)
The "Information Profile" illustrates graphically the quality and type of information in the data.
(It turns out that P1's solution contained the highest Information Score. P2 and P3 had 376.70, P4 had 93.04.) More about both the Information Score and Information Profile can be found here.
The next two sections provide key indicators resulting in loan paybacks ("10 Most Positive Useful Symbols"),
and defaults ("10 Most Negative Useful Symbols"),
as provided by the "affinity" score. (The "affinity" score is derived directly from each record's utility scores.)
Clearly, a loan grade of "A", FICO scores in the 700s, or short (36 month) loan terms results in the most paybacks. (Should we only lend to "grade|A" applicants? We could, but they only come up 0.52% of the time!)
Lastly, look at the "Least Useful Symbols" list. Here, we see the data that hasn't provided any definitive answers to our question of loan defaults.
For example, a loan application with the word "community" in the employment title has an equal chance of defaulting as it does being paid off. Perhaps there haven't been enough examples, yet, to discriminate; there were only 6 samples of it in this data set.
If we're curious about any one of these, we can dig deeper by using the key value pairs separated by "|", called a "door" in our parlance. For example, let's find out if employment title plays any role in credit risk. Type (or copy-and-paste) "emp_title" into the "Review by key?" field in the right-hand menu:
Results for Most Positive and Most Negative now display the set of data with the "emp_title" key. For example,
shows that applicants with the "manager" in their title often pay back their loans.
These insights about loan risk enable us to go back and filter out some noise from the original data set, increasing the "signal to noise ratio" ("SNR").
This test used a small percentage of the full data set. More interesting indicators are revealed using a larger set. If we only want to find key indicators, then we can skip testing completely. We can run these reports after training the genie with the data. (Tip: If you're lazy, set the "Fraction of dataset to reserve for training" to 1, and "Number of times to run the test" to 1. All the data will be used for training. After training is completed, go to the "Information Analyzer".)
Genie's learning is open-ended. You can always teach it more at anytime without having to re-train it from scratch. This is extra useful for data sets like Lending Club's that release new data annually.
This genie's genome and test result files are available on GitHub. At this point, you can experiment and share your own solutions and test results.
In summary, we built and deployed a custom cognitive computer within minutes that analyzed our data and gave us insights into a subject we had no previous knowledge about. We tested the AI's solutions to pick the ones aligned with our needs. We used Genie's built-in information analysis to discover key indicators and noise within the data. Unlike other machine learning solutions, we didn't have to build any models to do this. We can continue teaching our genie in real-time. As our environment changes, the genie can adapt by learning or evolving. Armed with this confidence, we can use this genie to make predictions about loan defaults on new data, either through the Live Interface or the API.
(A special thanks goes to David McGoveran for his review and edits of this article.)
About: Intelligent Artifacts is the creator of Genie a cloud-based or on-premises environment for quickly creating and deploying machine intelligence. Experts and non-experts alike, can point and click their way through creating powerful artificial intelligence agents to conquer any machine learning problem, then use the Genie APIs to interact with their projects. Short on time or bandwidth? Soon developers will have the option to swing by the Genie Bazaar and buy a pre configured and tested Genie, already well on its way to solving many common machine learning problems.