The results of recent analytics competitions show that in increasing number of domains machine learning and data mining experts are able to produce better results than domain experts. Can Machine Learning on Big Data eventually replace Domain Expertise?
The latest KDnuggets Poll asks:
This poll is now closed, but here are the results of
Can Machine Learning on Big Data replace Domain Expertise?
This poll was motivated by a panel at a recent Strata 2012 Conference which debated the question
In data science, domain expertise is more important than machine learning skill.
The audience at that debate split almost evenly on this question.
Domain experts currently outperform analytic solutions in most domains, and will continue to do so for some time. However, the results of recent analytics competitions, and AI advances such as IBM Watson, show that in increasing number of domains machine learning and data mining experts are able to produce better results than domain experts. So, unless we think that humans have a magical ability that cannot be reproduced, eventually machines will be able to come up with better decisions. Sad, but probably inevitable.
What do you think?
Gregory Piatetsky, Editor.
Watson contains a lot of explicit and implicit domain knowledge about fine detail of the game of Jeopardy. The article that describes the system is very explicit on this. Even though it picks up what it knows about the content of the questions by other means, a lot of engineering went into tuning it to the opportunities provided by the game. Is anyone able to share information about exactly which bits of Watson are in common between the Jeopardy version and the spin-offs that are used for medicine and finance? That would be interesting to know.
For pure machine learning, I guess computers could beat domain experts in any domain (as long as one speak about predictions and not writing novels). The biggest difficulty for domain experts is to input their knowledge in the system in a meaningful way.
However, a much more difficult task is to transform business needs into a project and to transform results into automated processes for the company (industrialization).
Great question but in practice this shouldn't be "either/or" unless either no data exists or no expertise exists. Clearly, both have strengths and weaknesses. There are many anecdotes where model x blew away the best human performance on problem y. There are also many anecdotes where a model arrived at a highly accurate and completely useless or wrong result. Great Box/Draper quote: "All models are wrong, but some are useful."
Current data analytics excels in answering very focused questions and can (usually) deal with many more interacting variables than can a domain expert. However, due to lack of domain knowledge and/or modeling expertise, models may be misused or under-constrained which can lead to questionable results. Unfortunately, real-world data is nearly always flawed -- sometimes fatally flawed from a modeling perspective.
Currently, domain knowledge is required to ask the right questions, focus analytics in the right direction, properly structure tasks for analysis, assess the quality of a result, etc. The ideal situations are where modeling not only improves on current task performance but also advances human insight and helps us ask better questions.
If the time frame is left completely open-ended then yes, it is likely that someday models will be able to outperform domain expertise on virtually every application to which they are applied.
Another take on domain experts vs data
I don't think machines could write good literature any time soon, but making predictive decisions based on data is much easier for machines than writing novels. After all, domain experts have acquired their expertise by observation and learning. If the same data can be provided to an algorithm, in principle it should also learn to make good decisions. It might take 20, 100, or 1000 years, but it is likely to happen.
You may be able to predict how successful a given novel will be if you're given enough data, but could you use Big Data to create a new best-selling novel? Absurd example, but it speaks to that 'magical ability that cannot be reproduced'.
Then again, some people believe that the singularity is going to happen, so who knows.
It seems to me that there is no a clear cut on this! There will be situations and circumstances where an automatic approach will produce better outputs, without the need of a domain expert, others no. Big data can describe lots of things, but probably not everything. There will be cases where some chunks of information won't be capture automatically - either because it could be an outlier and not automatically capture in the model, or because it is simply not present. I had an interesting experience when working in application to support decision making in an hydroelectric power plant. I had quite a lot of data, and I have learned that one of the most precise chunks of information to forecast heavy rainfall in the reservoir area was the behavior of an specific kind of bird! The flying patterns of those birds would, very precisely, indicate high possibility of heavy rainfall in the following days, which is one of most important bits of information to manage the operation of floodgates. This information was passed to me during an informal conversation with a field engineering with many years of work in the reservoir's area.
My application would go through a series of data collection, about dew point levels, rainfall, etc etc. At the end, there was a small question: By the way, have you seen some big funny birds flying around the reservoir? :-)