| KDnuggets : News : 2007 : n05 : item5 | |
FeaturesSubject: HDC Patents covering Support Vector Machine - Too broad? (A data miner brought my attention to this story, concerned that HDC patents covering SVM are so broad that they stifle innovation. Furthermore, he thought that there may be prior art related to these patents. What do you think? Editor) You can reply to editor at kdnuggets dot com and if you wish, your reply will be kept confidential. HEALTH DISCOVERY CORPORATION OBTAINS CONSENT JUDGMENT IN PATENT LAWSUIT AGAINST EQUBITS LLC Equbits to Cease Use of HDC’s Patented SVM Technology SAVANNAH, GA, November 28, 2006 – Health Discovery Corporation ("HDC") (HDVY.OB) today announced that it has concluded its patent infringement lawsuit against Equbits LLC. HDC filed its initial action on June 26, 2006, alleging that Equbits had infringed three HDC patents, which relate to systems and/or methods for enhancing knowledge from data using its Support Vector Machine technology. The final consent judgment provides for a permanent injunction barring Equbits from any future use or sale of products utilizing HDC’s patented SVM technology. http://www.healthdiscoverycorp.com/pr/nov28_06.html (The patents assigned to HDC are below. Editor)
Enhancing Knowledge Discovery Using Multiple Support Vector Machines SUMMARY OF THE INVENTION The present invention meets the above described needs by providing a system and method for enhancing knowledge discovered from data using a learning machine in general and a support vector machine in particular. A training data set is pre-processed in order to allow the most advantageous application of the learning machine. Each training data point comprises a vector having one or more coordinates. Pre-processing the training data set may comprise identifying missing or erroneous data points and taking appropriate steps to correct the flawed data or as appropriate remove the observation or the entire field from the scope of the problem. Pre-processing the training data set may also comprise adding dimensionality to each training data point by adding one or more new coordinates to the vector. The new coordinates added to the vector may be derived by applying a transformation to one or more of the original coordinates. The transformation may be based on expert knowledge, or may be computationally derived. In a situation where the training data set comprises a continuous variable, the transformation may comprise optimally categorizing the continuous variable of the training data set. The support vector machine is trained using the pre-processed training data set. In this manner, the additional representations of the training data provided by the preprocessing may enhance the learning machine's ability to discover knowledge therefrom. In the particular context of support vector machines, the greater the dimensionality of the training set, the higher the quality of the generalizations that may be derived therefrom. When the knowledge to be discovered from the data relates to a regression or density estimation or where the training output comprises a continuous variable, the training output may be post-processed by optimally categorizing the training output to derive categorizations from the continuous variable. A test data set is pre-processed in the same manner as was the training data set. Then, the trained learning machine is tested using the pre-processed test data set. A test output of the trained learning machine may be post-processing to determine if the test output is an optimal solution. Post-processing the test output may comprise interpreting the test output into a format that may be compared with the test data set. Alternative postprocessing steps may enhance the human interpretability or suitability for additional processing of the output data. In the context of a support vector machine, the present invention also provides for the selection of a kernel prior to training the support vector machine. The selection of a kernel may be based on prior knowledge of the specific problem being addressed or analysis of the properties of any available data to be used with the learning machine and is typically dependant on the nature of the knowledge to be discovered from the data. Optionally, an iterative process comparing postprocessed training outputs or test outputs can be applied to make a determination as to which configuration provides the optimal solution. If the test output is not the optimal solution, the selection of the kernel may be adjusted and the support vector machine may be retrained and retested. When it is determined that the optimal solution has been identified, a live data set may be collected and pre-processed in the same manner as was the training data set. The pre-processed live data set is input into the learning machine for processing. The live output of the learning machine may then be post-processed by interpreting the live output into a computationally derived alphanumeric classifier. |
| KDnuggets : News : 2007 : n05 : item5 | |
Copyright © 2007 KDnuggets. Subscribe to KDnuggets News!