Dataiku Data Science Studio, now also runs on Apache Spark
Dataiku Data Science Studio version 2.1 has many useful features for Data Scientists, including integration with Apache Spark.
(3) Federate Technoslavia
There is an ever-growing number of technological frameworks and languages that vary widely in terms of their capabilities and overall evolution: Python → R → SQL → Pig → Hive → Spark. The addition of Apache Spark to the extensive number of datastores already supported by DSS offers users a unified interface for multiple frameworks, allowing them to immediately delve into the capabilities of Apache Spark, without having to learn the intricacies of a new of technological framework, dialect, or programming language. DSS 2.1 acts as an educational solution of sorts facilitating the ramping-up process by exposing users to a wide variety of technologies in a non-intimidating environment. This allows them to progress at their own speed all the while remaining productive.
(4) Machine Learning at Scale
Another important element that DSS brings to the table is the ability to train models using both MLlib and Scikit-Learn. MLlib is a Spark library constituted of highly scalable algorithms that can be trained across distributed data. By adding MLlib to the mix, users are now able to address large-scale projects by being able to model the data sources in their entirety. This access to the true power of cluster computing lets users scale up to model on datasets in their entirety instead of being constrained by sample size. Therefore, analysts can leverage the full cluster of data services and avoid the problems normally associated with a "divide and conquer" approach that may miss important segments of data.
Thanks to our users feedback, DSS just keeps getting better. You can try our software's Community Edition out for free here.
Your feedback is essential to us. If you have any suggestions on how we can continuously improve DSS to answer your needs, please let us know!
Related:
There is an ever-growing number of technological frameworks and languages that vary widely in terms of their capabilities and overall evolution: Python → R → SQL → Pig → Hive → Spark. The addition of Apache Spark to the extensive number of datastores already supported by DSS offers users a unified interface for multiple frameworks, allowing them to immediately delve into the capabilities of Apache Spark, without having to learn the intricacies of a new of technological framework, dialect, or programming language. DSS 2.1 acts as an educational solution of sorts facilitating the ramping-up process by exposing users to a wide variety of technologies in a non-intimidating environment. This allows them to progress at their own speed all the while remaining productive.
(4) Machine Learning at Scale
Another important element that DSS brings to the table is the ability to train models using both MLlib and Scikit-Learn. MLlib is a Spark library constituted of highly scalable algorithms that can be trained across distributed data. By adding MLlib to the mix, users are now able to address large-scale projects by being able to model the data sources in their entirety. This access to the true power of cluster computing lets users scale up to model on datasets in their entirety instead of being constrained by sample size. Therefore, analysts can leverage the full cluster of data services and avoid the problems normally associated with a "divide and conquer" approach that may miss important segments of data.
Thanks to our users feedback, DSS just keeps getting better. You can try our software's Community Edition out for free here.
Your feedback is essential to us. If you have any suggestions on how we can continuously improve DSS to answer your needs, please let us know!
Related: