12 Docker Commands Every Data Scientist Should Know

Looking to add Docker to your data science toolbox? Here’s a list of essential Docker commands to help you get started.



12 Docker Commands Every Data Scientist Should Know
Image by Author

 

Working on a data science project is always exciting. However, it is not without challenges. Each project requires you to install a (possibly) long list of libraries and specific versions of each library. So wrapping your head around the project’s dependency can be quite challenging. Here’s where Docker can help.

Docker is a popular containerization technology. With Docker, you can package your data science application—along with the code and required dependency—into a portable artifact called the image. Thus Docker facilitates replication of the development environment and makes local development a breeze. 

Here’s a list of essential Docker commands that’ll come in handy as you’re coding your way through your next project. We’ll work with images from Docker Hub, one of the most popular platforms to find, share, and manage container images.

 

1. docker pull

 

To the pull an image from the from Docker Hub, you can run the docker pull command as shown:

docker pull <name-of-the-image>

 

For example, to pull the Python image from Docker Hub, you can run the following command:

docker pull python

 

12 Docker Commands Every Data Scientist Should Know

 

By default, this command pulls the latest version of the image available. You can optionally add a tag to pull a specific version of the image. 

 

Note: If you'd like to run the Docker commands as a user without superuser permissions, create the docker group and add the user to that group.

 

2. docker images

 

To view the list of all the downloaded images, you can run the docker images command.

docker images

 

12 Docker Commands Every Data Scientist Should Know

 

3. docker run 

 

You can start a container from the downloaded image using the docker run command. After you’ve pulled the image from the registry, you can spin up a docker container, a running instance of the image, as shown:

docker run <name-of-the-image>
docker run [options] <name-of-the-image> 

 

For example, you can use the -i option to launch an interactive Python REPL while starting the container, and the -t option assigns a pseudo-tty, as shown:

 

12 Docker Commands Every Data Scientist Should Know

 

An image is a portable artifact and a container is a running instance of the image. This means you can run multiple containers from a single Docker image.

 

12 Docker Commands Every Data Scientist Should Know
Image by Author

 

4. docker ps

 

You can run the docker ps command to get a list of all the running containers.

docker ps

 

12 Docker Commands Every Data Scientist Should Know

 

Note that there’s a CONTAINER ID associated with each Docker container. Over the next few minutes, we’ll learn Docker commands to stop and restart containers, examine logs, and more. We’ll use the CONTAINER ID of a particular container in those commands.

Suppose you ran a container in one of the previous sessions, and the container is not running anymore. In this case, you can run the docker ps command with the -a option. This will list all the containers: those that are currently running as well as those that were stopped previously.

docker ps -a

 

5. docker stop

 

You may sometimes need to stop a running container. To do so, run the docker stop command.

docker stop <CONTAINER ID>

 

6. docker start

 

You can use the docker start command to restart a previously stopped container. You can run the docker ps -a command, grab the container ID, and then use it in the docker start command to restart a container. 

docker start <CONTAINER ID>

 

7. docker rmi

 

To remove a specific image, you can run the docker rmi command.

docker rmi <name-of-the-image>

 

Running this command removes the image from your local development environment. The next time you’d like to start a container from the image, you’ll need to pull the image from DockerHub.

 

8. docker rm

 

To remove a container permanently from your development environment, you can run the docker rm command. However it's recommended to ensure that the container is stopped before attempting to remove it.

docker rm <CONTAINER ID>

 

9. docker logs

 

The docker logs command can be especially helpful when debugging containers.

docker logs <CONTAINER ID>

 

12 Docker Commands Every Data Scientist Should Know

 

10. docker exec

 

Using the docker exec command, you can execute commands run inside a running container.

docker exec <CONTAINER ID> <COMMAND> <ARGS>

 

Try it yourself: As a quick exercise to sum up what you've learned, pull the official Bash image from Docker Hub. Next, try starting an interactive terminal session when spinning up the container, and run a basic Bash command.

 

11. docker version

 

To check the version of docker installed in your working environment, run the docker version command:

docker version

 

12 Docker Commands Every Data Scientist Should Know

 

12. docker info

 

The docker info command provides more granular information on the system-wide installation of Docker.

docker info

 

12 Docker Commands Every Data Scientist Should Know
Output of docker info (truncated)

 

Conclusion

 

I hope you found this tutorial on essential docker commands helpful. Once you’re familiar with Docker, you can try dockerizing your Python and data science applications. You can then push your application’s image to DockerHub. Other developers will then be able to pull your image and spin up containers—in their working environment—all with a single command.
 
 
Bala Priya C is a technical writer who enjoys creating long-form content. Her areas of interest include math, programming, and data science. She shares her learning with the developer community by authoring tutorials, how-to guides, and more.