Top New Features in Orange 3 Data Mining Platform
The main technical advantage of Orange 3 is its integration with NumPy and SciPy libraries. Other improvements include reading online data, working through queries for SQL and pre-processing.
5. Focus on interactive visualization. Load data, estimate distances, plot hierarchical clustering, choose an interesting branch in a histogram and show where these data are in a PCA plot. Or do k-means clustering, plot its silhouette diagram, choose instances with low silhouette and reveal these outliers in the most informative scatter-plot. Interactive visualizations, where one can select a subset of the data for further exploration, was always a strong feature of Orange. In Orange 3, we focus on exploratory data analysis even more by making every visual display interactive.
6. Numpy based data storage. While Orange’s front-end for visual programming has not changed much, there are several major overhauls under the hood. We have replaced the custom data storage implementation (the old ExampleTable class) with a new one (Orange.data.Table) that stores the actual data in numpy arrays. The algorithms that work on the data are also no longer a part of the C library, but implemented using a combination of code in Cython and efficient numpy operations.
7. Wrappers for scikit-learn algorithms. Numpy-based data storage that replaced Orange’s own internal data format was only a part of a bigger vision: interoperability with now abundant external Python-based data science libraries. We love the suite of algorithms Orange offers, with quite some focus on symbolic learning, but are also happy to be able to integrate external libraries that provide well-tested implementations of powerful data mining algorithms. An example of such library is scikit-learn, from which we politely borrow multiple algorithms. If any of these is not included yet, Orange provides a set of classes that simplify the wrapping of scikit-learn methods. Wrapping tightens the gap between scikit-learn’s “know what you are doing” and Orange’s “no matter the data and formats, it just works” philosophies.
Ajda Pretnar is Project Assistant at Laboratory for Bioinformatics at University of Ljubljana. She holds MA in International Relations. She writes technical documentation for Orange, software testing and manages bug reports. She is project coordinator for educational tutorials on data mining.
Related:
6. Numpy based data storage. While Orange’s front-end for visual programming has not changed much, there are several major overhauls under the hood. We have replaced the custom data storage implementation (the old ExampleTable class) with a new one (Orange.data.Table) that stores the actual data in numpy arrays. The algorithms that work on the data are also no longer a part of the C library, but implemented using a combination of code in Cython and efficient numpy operations.
7. Wrappers for scikit-learn algorithms. Numpy-based data storage that replaced Orange’s own internal data format was only a part of a bigger vision: interoperability with now abundant external Python-based data science libraries. We love the suite of algorithms Orange offers, with quite some focus on symbolic learning, but are also happy to be able to integrate external libraries that provide well-tested implementations of powerful data mining algorithms. An example of such library is scikit-learn, from which we politely borrow multiple algorithms. If any of these is not included yet, Orange provides a set of classes that simplify the wrapping of scikit-learn methods. Wrapping tightens the gap between scikit-learn’s “know what you are doing” and Orange’s “no matter the data and formats, it just works” philosophies.
Related: