Alternative Cloud Hosted Data Science Environments
Over the years new alternative providers have risen to provided a solitary data science environment hosted on the cloud for data scientist to analyze, host and share their work.
Major cloud providers such as AWS, GCP and Azure all offer a Data Science environment using a jupyter environment. For a while they were the only options for Data Scientists that needed the strong compute and storage capacity. Over the years new alternative providers have risen to provided a solitary data science environment hosted on the cloud for data scientist to analyze, host and share their work.
The following 2 providers are great alternatives for anyone looking to skip the entire cloud environment and are just looking for a place they can get large storage and access to strong CPU/GPU power.
MatrixDS provides a data science environment with a social network type interface to share their work and receive reviews of their work as well. Users can easily add someone else to join their project to collaborate with their peers. The platform also allows you to fork another persons project like GitHub and has a private and public mode as well. We are able upload our files directly on to the platform or pull from GitHub, Amazon S3, Dropbox, or Google Cloud.
The users are given the opportunity to start a virtual machine each with its own language and MatrixDS currently supports R, Python and Julia for analysis. For visualization and presentation it supports Shiny, Superset Bokeh etc.
To get started with a Jupyter environment in MatrixDS:
- Sign-up with a free account to begin with.
- You will be taken to the Projects page to click on the green button on the top right corner to start a new project. Give it a name and description and click CREATE.
- Configuration of your VM will be needed and we can start with 4GB RAM and a 1 Core CPU.
Saturn Cloud is a relatively brand new service co-founded in 2018 by Hugo Shi who is also the co-founder of Anaconda. Saturn Cloud aims to be the platform where we can be the data scientists and they can be the data engineers. They are here to provide the DevOps of data science to the masses so that we can focus on the analysis stage as much as possible.
Saturn Cloud uses AWS for the back-end to host their Jupyter environment and gives you the ability to control and budget your costs for you and your team. It offers version control and parallel computing with Dask so that it is compatible with Python libraries such as NumPy, Pandas and Scikit Learn instead of having to use languages like Spark or Scala for distributed computing.
To get started with Saturn Cloud:
- The first step is to sign up and create an account with them.
- To create your notebook instance specify a name, storage, GPU, requirements.txt file to get started.
- Click CREATE and your server will start and you will have your notebook instance ready.
One of the biggest draws to Saturn Cloud would be the parallel processing capabilities to speed up any data science operations. Saturn Cloud have written an article for conducting parallel processing with Daskparallel processing with Dask
These are two services that are becoming adopted by the wider data science community as an alternative to your traditional cloud providers. Sometimes for certain projects a full stack cloud experience may not be needed so these 2 platforms can provide what is only needed for our projects and not have to compromise on storage or compute capacity.
- The 4 Hottest Trends in Data Science for 2020
- Easy, One-Click Jupyter Notebooks
- Using DC/OS to Accelerate Data Science in the Enterprise