- 8 New Tools I Learned as a Data Scientist in 2020 - Jan 14, 2021.
The author shares the data science tools learned while making the move from Docker to Live Deployments.
- How to Acquire the Most Wanted Data Science Skills - Nov 13, 2020.
We recently surveyed KDnuggets readers to determine the "most wanted" data science skills. Since they seem to be those most in demand from practitioners, here is a collection of resources for getting started with this learning.
- You Don’t Have to Use Docker Anymore - Oct 29, 2020.
Docker is not the only containerization tool out there and there might just be better alternatives…
- Stop Running Jupyter Notebooks From Your Command Line - Oct 28, 2020.
Instead, run your Jupyter Notebook as a stand alone web app.
- Deploying Secure and Scalable Streamlit Apps on AWS with Docker Swarm, Traefik and Keycloak - Oct 23, 2020.
If you are a data scientist who just wants to get the work done but doesn’t necessarily want to go down the DevOps rabbit hole, this tutorial offers a relatively straightforward deployment solution leveraging Docker Swarm and Traefik, with an option of adding user authentication with Keycloak.
- KDnuggets™ News 20:n39, Oct 14: A step-by-step guide for creating an authentic data science portfolio project; Strategies of Docker Images Optimization - Oct 14, 2020.
Learn how to create inspiring Data Science portfolio projects; How to optimize Docker images; How LinkedIn Uses Machine Learning in its Recruiter Recommendation Systems; Understand the Algorithms of Social Manipulation; and read the annotated Machine Learning research papers.
- Strategies of Docker Images Optimization - Oct 8, 2020.
Large Docker images lengthen the time it takes to build and share images between clusters and cloud providers. When creating applications, it’s therefore worth optimizing Docker Images and Dockerfiles to help teams share smaller images, improve performance, and debug problems.
- Automating Every Aspect of Your Python Project - Sep 18, 2020.
Every Python project can benefit from automation using Makefile, optimized Docker images, well configured CI/CD, Code Quality Tools and more…
- Apache Spark Cluster on Docker - Jul 22, 2020.
Build your own Apache Spark cluster in standalone mode on Docker with a JupyterLab interface.
- Building a REST API with Tensorflow Serving (Part 2) - Jul 21, 2020.
This post is the second part of the tutorial of Tensorflow Serving in order to productionize Tensorflow objects and build a REST API to make calls to them.
- Deploy Machine Learning Pipeline on AWS Fargate - Jul 3, 2020.
A step-by-step beginner’s guide to containerize and deploy ML pipeline serverless on AWS Fargate.
- KDnuggets™ News 20:n24, Jun 17: Easy Speech-to-Text with Python; Data Distributions Overview; Java for Data Scientists - Jun 17, 2020.
Also: Deploy a Machine Learning Pipeline to the Cloud Using a Docker Container; Five Cognitive Biases In Data Science (And how to avoid them); Understanding Machine Learning: The Free eBook; Simplified Mixed Feature Type Preprocessing in Scikit-Learn with Pipelines; A Complete guide to Google Colab for Deep Learning
- Deploy a Machine Learning Pipeline to the Cloud Using a Docker Container - Jun 12, 2020.
In this tutorial, we will use a previously-built machine learning pipeline and Flask app to demonstrate how to deploy a machine learning pipeline as a web app using the Microsoft Azure Web App Service.
- Docker: Containerization for Data Scientists - Jun 2, 2020.
This article is a simple explanation to containerization with Docker.
- Taming Complexity in MLOps - May 28, 2020.
A greatly expanded v2.0 of the open-source Orbyter toolkit helps data science teams continue to streamline machine learning delivery pipelines, with an emphasis on seamless deployment to production.
- Dockerize Jupyter with the Visual Debugger - Apr 17, 2020.
A step by step guide to enable and use visual debugging in Jupyter in a docker container.
- GitHub Python Data Science Spotlight: High Level Machine Learning & NLP, Ensembles, Command Line Viz & Docker Made Easy - Oct 16, 2018.
This post spotlights 5 data science projects, all of which are open source and are present on GitHub repositories, focusing on high level machine learning libraries and low level support tools.
- Datmo: the Open Source tool for tracking and reproducible Machine Learning experiments - Sep 26, 2018.
As a data scientist, managing environments and experiments is always hard and results in wasted time and effort with all the troubleshooting and lost work. With datmo, you can track your experiments using this common standard and not worry about reproduction of previous work.
- Training with Keras-MXNet on Amazon SageMaker - Sep 10, 2018.
In this post, you will learn how to train Keras-MXNet jobs on Amazon SageMaker. I’ll show you how to build custom Docker containers for CPU and GPU training, configure multi-GPU training, pass parameters to a Keras script, and save the trained models in Keras and MXNet formats.
Pages: 1 2
- Docker Cheat Sheet - Aug 21, 2018.
This comprehensive cheat sheet will assist Docker users, experienced and new, in getting containers up-and-running quickly. We list commands that will allow users to install, build, ship and run Docker containers.
- KDnuggets™ News 18:n31, Aug 15: Top 10 roles in AI and data science; Github Data Science Spotlight: Python tools for Machine Learning - Aug 15, 2018.
Also: A Practitioner Guide to NLP; Reinforcement Learning: The Business Use Case; Data Scientist guide for getting started with Docker
- Data Scientist guide for getting started with Docker - Aug 14, 2018.
Docker is an increasingly popular way to create and deploy applications through virtualization, but can it be useful for data scientists? This guide should help you quickly get started.
- Setting up your AI Dev Environment in 5 Minutes - Aug 13, 2018.
Whether you're a novice data science enthusiast setting up TensorFlow for the first time, or a seasoned AI engineer working with terabytes of data, getting your libraries, packages, and frameworks installed is always a struggle. Learn how datmo, an open source python package, helps you get started in minutes.
- Torus for Docker-First Data Science - May 8, 2018.
To help data science teams adopt Docker and apply DevOps best practices to streamline machine learning delivery pipelines, we open-sourced a toolkit based on the popular cookiecutter project structure.
- KDnuggets™ News 18:n03, Jan 17: Top 10 TED Talks on Data Science, Machine Learning; How Docker Can Help You Become A More Effective Data Scientist - Jan 17, 2018.
Also A Primer on Web Scraping in R; Elasticsearch for Dummies; Generative Adversarial Networks, an overview,
- How Docker Can Help You Become A More Effective Data Scientist - Jan 10, 2018.
I wrote this quick primer so you don’t have to parse all the information out there and instead can learn the things you need to know to quickly get started.
Pages: 1 2
- Docker for Data Science - Jan 2, 2018.
Coming from a statistics background I used to care very little about how to install software and would occasionally spend a few days trying to resolve system configuration issues. Enter the god-send Docker almighty.
- Introducing R-Brain: A New Data Science Platform - Oct 11, 2017.
R-Brain is a next generation platform for data science built on top of Jupyterlab with Docker, which supports not only R, but also Python, SQL, has integrated intellisense, debugging, packaging, and publishing capabilities.
- Top KDnuggets tweets, Mar 29 – Apr 04: Free Must-Read Books for #MachineLearning; #Apache Slug, new #BigData project - Apr 5, 2017.
Also Self-driving talent is fleeing Google and Uber to catch the autonomous-driving; Using Docker, CoreOS For #GPU Based #DeepLearning; A Short Guide to Navigating the Jupyter Ecosystem.
- Data Science Deployments With Docker - Dec 1, 2016.
With the recent release of NVIDIA’s nvidia-docker tool, accessing GPUs from within Docker is a breeze. In this tutorial we’ll walk you through setting up nvidia-docker so you too can deploy machine learning models with ease.
- FlyElephant 2.0, Big Data High-Performance Computing Platform - Aug 1, 2016.
FlyElephant is a platform for data scientists, engineers and scientists, which provides a ready-computing infrastructure for high-performance computing and rendering.
- Top KDnuggets tweets, Jun 8-14: All-in-one Docker image for Deep Learning; Good Book list for Data lovers - Jun 15, 2016.
Good Book list for #Data lovers; OpenAI - a living collection of important and fun problems; All-in-one #Docker image for #DeepLearning; 10 Useful #Python #DataVisualization Libraries for Any Discipline;
- Jupyter+Spark+Mesos: An “Opinionated” Docker Image - May 31, 2016.
Check "opinionated" Docker-based stacks for Jupyter, including one to combine Jupyter and Spark right out of the gate.
- The Post-Hadoop World: New Kid On The Block Technologies - Feb 5, 2015.
Big Data technology has evolved rapidly, and although Hadoop and Hive are still its core components, a new breed of technologies has emerged and is changing how we work with data, enabling more fluid ways to process, store, and manage it.
- Boston Docker Global Hack Day and Meetup, Oct 30 - Oct 19, 2014.
Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Join other Boston-area developers for Docker Global Hack Day #2 at O'Reilly Media in Cambridge, MA.
- Containers: The Enabler of YARN - Jul 28, 2014.
The evolution of a data-center operating system is discussed along with the underlying challenges and approaches being followed. Containers play a big role in enabling the required abstraction and deliver additional benefits.