Exclusive: KXEN CTO Eric Marcade on Startups, Predictive Analytics, Big Data, Recommendations
KDnuggets talks with KXEN CTO Eric Marcade on start-up environments, KXEN 3 big shifts, Big Data, KXEN approach to automation, death of privacy, and new KXEN Recommendations.
Gregory Piatetsky, April 13, 2013.
I have met Eric Marcade over 15 years ago when I served on KXEN Scientific Advisory Board in late 1990-s. He has a rare combination of technical knowledge, business acumen, and French joie de vivre which made our meetings interesting and enjoyable. I interviewed him recently about KXEN evolution, technology, and Big Data.
Erik Marcade, founder and CTO of KXEN, is responsible for software development and information technologies. He has over 25 years of experience in the artificial Intelligence and machine learning industry. Prior to KXEN, Erik developed real-time software expertise at Cadence Design Systems, accountable for advancing real-time software systems as well as managing "system-on-a-chip" projects. He also co-founded Mimetics, a French company that sold development environment, optical character recognition (OCR) products and services using neural network technology.
Prior to Mimetics, Erik joined Thomson-CSF Weapon System Division (now Thales) as a software engineer and project manager working on the application of artificial intelligence for projects in weapons allocation, target detection and tracking, geo-strategic assessment, and software quality control.
He contributed to the creation of Thomson Research Laboratories in Palo Alto, CA (Pacific Rim Operation-PRO) as senior software engineer. There he collaborated with Stanford University on the automatic landing and flare system for Boeing, and Kestrel Institute, a non-profit computer science research organization. He returned to France to head Esprit projects on neural networks development.
Erik holds an engineering degree from Ecole de l'Aeronautique et de l'Espace, specializing in process control, signal processing, computer science, and artificial intelligence.
Gregory Piatetsky: 1. How do you compare technology and start-up environment in San Francisco and in France?
Eric Marcade: You can find bright people anywhere on the planet, and with communication technologies these days, you can be very well trained on the latest and greatest technology anywhere. This said, having worked in both Paris and the Bay Area at different points in my career, it's true that there is an atmosphere and culture in San Francisco and the Silicon Valley that you do not find anywhere else: leaving work to go to bars and hearing conversations around Big data, Hadoop or Cloud development environment gives a certain spin and a definitive energy that is difficult to replicate... in Paris for example. The crowd effect of having so many people working on the next frontier in terms of technology is a key differentiator that attracts even more bright people.
As far as start-up environment, I'd say when you create a start-up in France, you have to think first about how you will disseminate abroad, but when you create a start-up in San Francisco, you can focus first on the US market and that gives you a head-start... This is why we launched KXEN in the US from the start.
GP: 2. What are the main advantages of KXEN compared to other analytics tools, like SAS, IBM SPSS, or R? What KXEN does not do as well?
EM: Gregory, as you know, there's so much buzz around Big Data and Predictive Analytics today, so I'm happy to have the opportunity to clarify KXEN's approach to Predictive Analytics here on KDnuggets!
KXEN's focus since we first founded the company was to be the leading Predictive Analytics solution for business users by integrating directly into a company's business processes. Our focus is to make Predictive Analytics efficient and accessible to data scientists as well as business analysts by industrializing the predictive process. We do this by automating each step of the predictive lifecycle including data prep, modeling and deployment.
Traditional predictive solutions typically propose expensive hardware and parallel processing (e.g. grid computing) to accelerate the modeling lifecycle - but all that does is cut CPU time, not human time! Data scientists still need to prepare the analytical data manually, select the right algorithm, properly encode the data, and test and fine-tune the model... but as you know, this takes a lot of time and can introduce human error. Our philosophy at KXEN is to automate these tasks to cut human time, and deliver repeatable results.
What we don't do is propose a huge library of algorithms for data scientists to sift through and determine which one is the right one every time they have a business question to solve. Of course, we cover all the major types of data mining function, including classification, regression, attribute importance, segmentation/clustering, forecasting, association rules, recommendation social network analysis (SNA). We just take a much more results-oriented approach than traditional solutions.
GP: 3. You founded KXEN. How has KXEN evolved with the industry and what were 3 most important changes that KXEN did?
EM: When we founded KXEN over ten years ago, we had the vision that Predictive Analytics would be embedded in any best in class business process. So we leveraged new advances in mathematics to fully automate the predictive process and created a very condensed API allowing integrators and in-house data scientists to easily embed our technology into their platforms and infrastructure. We started with data mining algorithms and data encoding, later we added automated data preparation as well as in-database deployment.
Over the years we've seen our vision become a reality. We have over 500 customers worldwide today like AAA, Allegro, Bank of America, Barclays, CBS Interactive, Overstock.com, PT XL Axiata, RealNetworks, Rhapsody, Rogers, Sears, Shutterfly, U.S. Cellular and Vodafone and have more momentum today than ever before! We're seeing Predictive Analytics become more and more pervasive, crossing the frontier from "nice to have" to "need to have".
We believe KXEN is helping to bridge this need for insight. Whereas, predictive analytics has traditionally only been available to large B2C companies with dedicated teams of data scientists and large budgets, KXEN's easy to use automated approach is allowing companies of all sizes to take advantage of best-in-class predictive techniques.
The next big shift is the Cloud. As we strategized about how we wanted to address it, we fundamentally believed that "modeling as a service" was not the answer, because modeling is only a single step in the predictive process. You also need to automate data preparation and deployment at the point of contact to provide "analytical services" in the cloud. That's why we've taken an approach at KXEN to make cloud-based apps smarter, by building predictive apps on our own multi-tenant, cloud-native service. This leverages all the same automation capabilities used by our on-premise customer base to take care of the end-to-end predictive process, including data prep, modeling and deployment.
Our first predictive apps are delivered on Salesforce.com's AppExchange and include Predictive Offers, which is a next best activity solution, and Predictive Lead Scoring, which as it sounds scores sales leads based on their likelihood to become revenue generating events. Customers simply install the apps, do some basic setup, and then KXEN instantly starts learning. It's as simple as that.
The third big shift is as you'd expect, is Big Data. I see a lot of people looking for parallelization of the algorithms, but to me, this is not the major issue. Big Data brings new data sources, including machine-generated data (weblogs, RFID, sensors, etc.) that often result in analytical data sets with ultra-high dimensions (thousands or even tens of thousands of columns) and for which there is no prior knowledge or the ability to make assumptions about what will be predictive. Predictive engines in this environment must be able to discover models with thousands of input variables automatically without relying on a data scientist's "a priori" knowledge. KXEN's automated approach was built for this.
I'd also note that there is a lot of talk these days about the "data scientist shortage". This is a huge problem across the industry, but one in which our customers are not really impacted. By automating many of the time-consuming, repetitive tasks in predictive modeling, a new class of users without advanced degrees in statistics is able to self-service their own analytical requirements while data scientists can focus on the more complex, pressing needs of the business.
GP: 4. What is your view of Big Data - is there too much hype, or is it still underappreciated? Is consumer privacy dead, as Roger Ehrenberg says?
EM: Big Data is definitely the hype, but it's also real although it's going to take some time before it becomes truly mainstream. I see Big Data as a reaction to monetize large volume of data in a cheap way. Big Data is not just a matter of volume but a cost equation to make sure that you do not spend too much if you want to use stored data. At present time, most of the spending is still done on Big Data infrastructure (on commodity hardware and a staff of programmers), but soon enough it will be on Big Data apps. I don't believe Big Data is only for unstructured data. Lots of big companies are analyzing structured and semi-structured data, like weblogs or machine-generated data.
This is why I don't believe in the SQL/No-SQL fight. Many possibilities exist to see data stored through SQL and I'm sure VCs see every week a lot of startups willing to allow SQL access on Big Data platform. They are still not yet efficient (I mean not as efficient as a skilled programmer in Map Reduce could get) but it will come, and there will be extensions dedicated to non-structured data included - typically, SQL extensions. A good example of that is SQL/MM that provides a standard package to manage data in multimedia applications - typically unstructured data. BI tools are using these extensions and so is KXEN.
Let's face it, Big Data is perfect to bring together data coming from different sources especially long "transactional" tables such as Web logs, Call Detail Records, Point Of Sales Tickets and the like. And yes it can be hard to reconcile and merge all the data for each customer or individual from all these sources, but once it is done, then you have your big 360-view of your customer containing all available customer attributes, and... that's predictive analytics wonderland!
Well, "wonderland" with one caveat: you need to have automated techniques that can cope with very high-dimensional problems. For instance, a Big Data infrastructure will allow you to create churn or propensity models based on an unlimited number of customer attributes (usually thousands). KXEN was designed to build models in such high dimensional space and we're the only solution I know of that's on the market. Some benchmarks I see praising "Big Data Predictive Analytics Ready" to build (logistic) regression on data sets with 100 million records and 5 columns isn't reality. For our customers it's not uncommon to build predictive models on thousands on columns for increased model accuracy.
I am not sure about the fact that privacy is dead; at the origin the data that can be aggregated is the data that you, as user of a service, have allowed at one point or another to be known. What is sure is that legislation must be very clear on the fact that you are the owner of your individual data. I hear a lot of people talking also about "Open Data". I think this will not change the landscape of Predictive Analytics in the coming years. I am sure that it will allow some people to produce nice reports and the like, but the fact that there is no consensus on the efforts for standard formats and processes will make it difficult for an actual deployment of Predictive Analytics in production.
GP: 5. Tell us about the new KXEN recommendations - what makes them different from previous recommendation approaches?
EM: KXEN's InfiniteInsight® Recommendation is built to provide personalized recommendations, like products, content or targeted ads, to each unique customer. Unlike traditional recommendation engines, InfiniteInsight® Recommendation is designed with the following core capabilities:
- We're adaptive, addressing any type of business question, for example product recommendations, personalizing digital content, social recommendations (e.g. friends) and targeted ads.
- We analyze all available Big Data sources such as visitor click paths, browsed products, items placed in shopping carts, social circles, and purchase transactions.
- We're flexible to our customers' business processes, so they can create highly targeted recommendations. Our approach allows customers to build, weight and combine multiple rule sets which to create recommendations that incorporate corporate knowledge and know-how. For example, a rule linking 2 products commonly bought may need a higher weight than 2 products commonly viewed. With KXEN, you have the flexibility to build recommendations that make sense for the business...
- We're plug & play with our entire Predictive Analytics suite. For example, propensity scores can be integrated with recommendation rules to create even more personalized recommendations.
Here is an example which explains how KXEN recommendations work.
Figure 1: Suppose our visitor is authenticated: it's Julien and we know his profile. Let's also assume that Julien clicked on various products, the last one being a case of beer. We want to know the best product to recommend to him. In this example, InfiniteInsight® Recommendation will analyze two distinct data sources (past clickstreams and purchases) and produce two rulesets (C and P). Of course, in a real-life scenario, customers' can incorporate as many sources (and resulting sources) as needed.
Figure 2: A micro-segmentation (using KXEN's classification algorithm) identifies that Julien has a fairly high propensity to buy products belonging to the "Snack" category and an even higher propensity for "Baby" products. By injecting the results of these micro-segmentations into the previously built rulesets, we can "personalize" the recommendations by reweighting the rules with respect to these propensities. In our example, the "Beer -> Chips" rule for Julien now has an 80% * 60% = 48% probability of the association occurring.
Figure 3: The last step is the weighting of the rulesets. We need to "merge" the rules with the "know how". How? The business admin (or the marketer or the merchandiser) leverages their business knowledge to decide what's most important, e.g. give 3x more importance to the rules coming from purchases vs. clickstreams. Multiply this all together and you get the final result: Julien's personalized ruleset with the strongest (most probable) rule in the case of Julien clicking on "beers" is that Julien will buy... "diapers" - the resulting recommendation.
[Not Shown] There is a second technique that helps further personalize the rules: InfiniteInsight® builds communities of users, based on the similarity in clicks and product purchases, using our clustering/segmentation algorithm. After profiling strategic segments, we might see that Julien belongs to a community mostly defined by "Junk Food"... Hence the probabilities of the previous rules could be modified again (not shown here).
Here is the second part of the Eric Marcade (KXEN) interview.