AI for Fun & Profit: Using the new Genie Cognitive Computing Platform for P2P Lending
This tutorials uses the recently-released Genie (an acronym for General Evolving Networked Intelligence Engine) platform to learn from P2P (peer-to-peer) loan data. Experts and non-experts alike can leverage Genie to analyze Big Data, recognize objects, events, and patterns, and more.
By Sevak Avakians, Intelligent Artifacts.
Last month, the Genie Cognitive Computing Factory™ ("Genie") was released for public use. Genie (an acronym for General Evolving Networked Intelligence Engine) easily allows both experts and non-experts alike to create artificial intelligence agents (called “genies”) to analyze Big Data, recognize objects, events, and patterns (a.k.a. “classify data”), make predictions & decisions, take appropriate adaptive actions, and evolve within the genie’s environment.
The intelligent objects created using the platform are called "genies". Genies have a bunch of unique features. We can't get into them all in this article. Instead, we'll explore a specific application: how a genie can learn from P2P (peer-to-peer) loan data; how I can reap the benefits of machine intelligence in a field that I have very limited, superficial knowledge about -- loan risk; and how a genie can help me identify what's important in the data sets. I won't go into an actual prediction using the Live Interface because that is covered in great detail in Genie's documentation.
For this, we won’t be using Genie’s API, so there will be no programming. We’ll use a genie’s “bottle”, i.e.the standard deployment object that has a built-in security, web-interface, backtesting, monitoring and management, and information analysis facilities.
A general overview of the steps to using Genie are:
- Convert your data into Genie's data format. The details of doing this are found in the documentation,here. This very easy step accounts for Genie's compatibility with your existing environment or current solutions. You can include all the available data fields. We'll let the genie figure out which fields are important during prediction time. We'll also use the built-in information analysis platform to identify key indicators and noise. We can use this to remove unnecessary data fields from our original data, increasing the performance of our solutions.
- Create a genie with one or more potential solutions. Each solution consists of at least one Cognitive Processor embedded within a "primitive". (More on this below.) The Cognitive Processor systematically extracts information from observed data, classifies it, makes predictions & decisions, and takes actions (if allowed!), regardless of the problem domain. Variations to this process are provided through configurable parameters. For more complex scenarios, the input data may require pre-processing or manipulation. This is done using data "manipulatives", i.e. atomic operators that can be connected in various topologies.
- Test the solutions. We'll use the built-in backtesting platform to see how well our solutions perform. It'll also help us understand what to tweak to make the solutions better.
- Iterate solutions until we're satisfied. We can manually play with variables until we're satisfied, or use the built-in genie "Evolver" engine to automatically breed and mutate solutions. Genie configuration files, called "genomes", are JSON objects that can be downloaded. So, we can also create our own custom algorithms to produce new genies.
- Deploy into production. Generally, an appropriate application layer is provided as a custom interface between users and genies. Examples are mobile or web apps as a front-end, tied to a data collection system in the backend, both which may interface with one or more genies.
The Data Set
For convenience, we've already converted a portion of the data set to genie's format. The files are available on GitHub.
These are simply zip files renamed with a “.gdz” extension to remind us that they’re already in Genie’s data format. We’ll use these zipped files for backtesting against our solution, allowing us to validate and improve it very rapidly.
Making a Genie
To create a genie, we need to use Intelligent Artifact’s Genie Factory. (It’s free to create genies. A charge of $0.04 per hour - minimum of $5 - is charged when the genie is deployed to a bottle. These charges cover the costs of a cloud-based VM.)
After signing up for the Genie Factory, go to the Genie Creator page. Here, we configure our cognitive computer by tweaking parameters and changing how Genie operators are interconnected (i.e., their topologies). All of these elements of a genie can be configured - either manually or else automatically - using genetic algorithms to automatically discover values and topologies that produce more desirable behavior.
The blue dot represents a single Cognitive Processor, and is called a “primitive”. The primitive extracts information from the input data and makes that information available for the user through the web-interface and/or API. Genies can have one or more primitives. For our use case, we’ll throw a few in and wire them slightly differently, to demonstrate the ideas of multiple solutions. Let’s put in four primitives. Right-click on the canvas and choose “Add Primitive”, three times.
“P1” will be our initial guess at a solution. It’s a primitive with nothing attached to it and uses only the default values for parameters. We can start off any problem this same way.
For “P2”, we’ll attach a “scalarShifter” manipulative by dragging from the right-hand menu and dropping it onto the “P2” node. Manipulatives are atomic operators on the data and Genie comes with a library of pre-built manipulatives. The “scalarShifter” manipulative shifts the incoming scalar value by some amount. Right-click on the manipulative and choose “Edit Properties” to see that amount. By default, this has a “value” field of “5”.
What does this mean? Genies data format consists of three field types: vectors, strings, and scalars. Scalar values are used to provide the cognitive processors with a “utility” score. Basically, this indicates the usefulness ("good" vs. "bad") of a record or sequence of events. In our data set, we’ve chosen a scalar value of “100” to indicate that a loan has been repaid. A scalar value of “-100” indicates that the loan defaulted (i.e. was not repaid). If the loan’s term has not been completed to give us this information, then we’ve dropped it from our data set. The choice of "100" and "-100" is arbitrary and you can choose other values that are more meaningful to you.
What if we are personally more or less risk averse? The “scalarShifter” manipulative lets us change that utility score without having to re-process the data set. I’m fairly risk-averse, so I choose to shift that utility value more towards the negative, say “-40”. I change that value and the alleles ("alleles" are a list of possible replacement values used to mutate the solution) to stay in the range I find acceptable. I can also control how a genie evolves by changing other genetic parameters like the mutability (the probability of a mutation between the values) and volatility (the "size" of each mutation).
For “P3”, we’ll attach the same “scalarShifter” manipulative and give it the same values as before. We’ll also abstract the output of “P3” and connect it to the input of “P4” using the “abstractionID3” manipulative. This manipulative creates a decision tree from the predictions of “P3”, and provides them as input to “P4”. It has some editable parameters, but we’ll just use the defaults.
The final topology of this genie should look like this:
Let’s name this genie, give it a brief description, and save it. We’re now ready to deploy!