Let’s Admit It: We’re a Long Way from Using “Real Intelligence” in AI
With the growth of AI systems and unstructured data, there is a need for an independent means of data curation, evaluation and measurement of output that does not depend on the natural language constructs of AI and creates a comparative method of how the data is processed.
By John Harney, CTO/Co-Founder of DataScava
For anyone worrying about machines taking over the world, I have reassuring news: The idea of artificial intelligence has been overcome by hype.
I don’t mean to belittle AI’s promise or even its existing capabilities. The technology allows organizations to put data to use in ways we could only imagine not that long ago. It’s revolutionized the way executives approach strategic planning. But very often lately—when I’m in meetings, reading research papers or listening to an expert’s presentation—I can’t shake the feeling that to many people, terms like “AI,” “machine learning” and “cognitive computing” have become answers unto themselves.
Today, solutions providers put statements like “AI-driven” or “harnessing the power of machine learning” at the core of their sales pitch. The buzzwords are certainly getting through. One colleague tells the story of a client calling “to make sure AI was included” in their data analysis project. Business people have been sold on the notion that today’s cutting-edge systems analyze data in a black box, then spit out reliable insights. How? They just do.
Of course, computer and data scientists understand things aren't that simple. AI systems "learn" how to recognize certain types of data by employing software to scan hundreds of thousands of documents on a topic, then build a model that searches similar material to identify information that matches the user's interests.
That “black box” many people refer to is, in fact, a hugely complex technology, designed, built and supported by data scientists and specialized software engineers to identify, extract, process and present the information necessary to solving a specific business problem in a fast and user-friendly way. Always remember: Information that’s out-of-date, unintelligible and inaccurate is worse than useless. It’s dangerous.
Ironically, the trouble with that approach recalls one of the oldest sayings in technology's book: "Garbage In, Garbage Out." Statistically, it's inevitable that a great proportion of the information used to teach AI is problematic. While it's true that AI systems learn over time, they only learn when humans review their results and instruct them on how to handle exceptions under different circumstances.
As a result, the "intelligence" of AI is extremely limited. It doesn’t "learn" so much as it takes instruction, and takes it literally. In contrast, the human brain has flexibility and judgement built into its intelligence. Our thought processes naturally put each piece of data into context.
When One Word Means Another
Context’s importance can’t be overstated, especially when you’re dealing with unstructured data. Language is built on words and phrases, not complete sentences. Consequently, systems must learn how to recognize the different meanings of, for example, “take the fifth” when it precedes words or phrases like “amendment,” “right turn,” “of vodka” or “assembly line part.” AI is only capable of telling which meaning applies after it’s been properly instructed, and corrected.
It follows, then, that in order to be truly useful, AI needs to learn from data that’s been curated. Someone, or something, has to review the data to determine whether it's accurate, relevant and given the appropriate weight. The more refined the input, the more useful the output at every iteration.
Unfortunately, this cuts against the grain of the classic AI sales pitch, which claims that the system “knows” what to do because it's taken in reams of data through intelligent algorithms that produce informed and appropriate results.
The pitch ignores the simple fact that those reams may include—in fact, probably include—a lot of information that’s simply not applicable to the analysis being run. It’s worth noting that 84 percent of CEOs worry about the quality of data they base their decisions on. That begs the question: Just how reliable can we expect the output to be? How can we be sure the AI has put all of its information into the proper context? If a system refers to “take the fifth,” how will it learn to correct itself when it means “take the fifth amendment” and not “take the fifth assembly line part?”
The Future’s Unstructured Path
Our answer lies in using a purely digital and domain-specific “white box” approach to unstructured search—the ability to sift through large volumes of raw text in order to index and mine it—that is both more precise and complementary to the semantic methodology AI generally uses to parse data.
Where AI searches from the bottom up—moving from word to phrase to sentence—DataScava’s method searches from the top down, beginning with the entire corpus of material and identifying the most relevant individual files. To accomplish this, it quantifies and compares relevant data points in multiple contexts and sources simultaneously. It searches for words and phrases by topic, which users can weight based on importance and how they’re used in their particular business. This patented methodology, called “profile matching,” enables the system to compare all files and brings to the forefront those with specific attributes.
The value of this approach isn’t limited to improving AI’s performance. By some estimates, around 90 percent of the world's data will be unstructured by 2022. For that reason, the ability to search and parse information is increasingly useful in its own right. For instance, when running alone, DataScava can filter real-time news feeds to keep businesses abreast of industry developments, analyze internal emails and other communications to identify security risks, mine research for medical and legal professionals, and uncover events that will impact the investment and trading strategies of financial managers.
AI is capable of performing powerful analyses and identifying trends and other insights that can result in improved decision-making and real business change. But it only performs well when it’s carefully taught and carefully tended. Undoubtedly, its capabilities will improve over time. But first we must apply technology that may not be as sexy, but generates better information for AI to learn from.
Bio: John Harney is CTO/Co-Founder of DataScava and holds three U.S. Patents related to a proprietary, highly precise form of parsing, indexing, quantifying, searching and matching unstructured data ("Profile Matching of Unstructured Data”). DataScava works alone or with traditional AI systems to turn unstructured data into industry-specific information you can act on. Unlike NLP solutions, our proprietary technology uses the unique language of your business instead of general linguistic and semantic libraries employed today. See our video at www.datascava.com for more information. In his earlier career, John worked on real-time financial trading and decision support systems development and integration in the U.S., Europe and Far East. Here is his LinkedIn Profile.