3 More Google Colab Environment Management Tips

This is a short collection of lessons learned using Colab as my main coding learning environment for the past few months. Some tricks are Colab specific, others as general Jupyter tips, and still more are filesystem related, but all have proven useful for me.

By Matthew Mayo, KDnuggets Managing Editor on January 2, 2019 in Google, Google Colab, Jupyter, Machine Learning, Python

comments

Google's Colab was greeted with all sorts of hype when it was first publicly released in early 2018. After originally being quite excited about it, I wrote a short post with a few tips for new users, which covered taking advantage of the free GPU runtime, installing additional third-party Python libraries, and uploading and using data files to your Colab environment.

Well, like every novelty, Colab's excitement wore off a bit after the initial euphoria. However, after dipping back into the books and needing a stable notebook environment which I could access and share seamlessly between my notebook, workstation, and Chromebook, I decided to give Colab another look. It turned out to be a good decision; I have been regularly using Colab for the past few months for all of my learning-related coding.

This post is a second entry in the short-but-hopefully-useful Google Colab environment tips series, and includes 3 more things I've learned while managing my own Colab coding environment while learning. I stress that this is what I am using for my learning, no mission-critical projects, and I am primarily using Colab as I can switch between my various machines seamlessly, while still having access to GPUs for training (and even TPUs).

Note that some of these are plain vanilla Jupyter tricks, so don't @ me.

0. Get Colab out of the browser

OK, this isn't really a Colab tip, but first get Colab out of the browser. If you're like me, your tab situation is sub-optimal. Sullying that up any further with both the Colab management interface and a bunch of notebooks won't help, so run Colab as its own standalone app. This is OS-dependent, but involves having the Colab "app" installed in your Chrome browser, and selecting both "Open as window" and "Create shortcuts..." from the app context menu, after which you need to find the shortcut and use it to open the app in its own window.

That's it; now you open Colab in its very own window, like in the post header image, from its very own icon. I know, not really on topic, but still useful.

1. Download a file to local computer

This is another simple one, but important enough to mention. A use case: you have created a Keras model and want to visualize the model. You call plot_model to create a PNG file, but since Colab virtual machines don't give you permanency in file storage, you want to download the image. The following excerpt does it:

# plot model
plot_model(model, to_file='rnn-mnist.png', show_shapes=True)

# download model image file
from google.colab import files
files.download('rnn-mnist.png')

Execute the cell and a pop up dialog prompts you for a download location. Which leads us to...

1(b). Display an image inline

Yeah, it's elementary, but it took me a few to remember that I was working with more or less a regular Jupyter environment with Colab. So, to display the above image inline, use:

# display model image file inline
from IPython.display import Image, display
Image('rnn-mnist.png')

A quick intuitive modification will allow you to inline all sorts of other files as you would expect. Lesson: remember you are using a plain old Jupyter notebook, more or less. OK, on to Colab stuff.

2. Access your Google Drive filesystem

Let's say I want to save that model image file to my Google Drive instead of my local computer. There are all sorts of ways to get files in and out of Google Drive. I find this is the most straightforward way of getting CSV data, for example, out of Google Drive.

# save model image file to Google Drive
from google.colab import drive
drive.mount('/content/gdrive')

After clicking the link and entering the authorization code, you can access your drive as follows:

!ls -la /content/gdrive/My\ Drive/Colab\ Notebooks/

Sure, it isn't permanent, but it isn't much work, and seems more straightforward than using any of the other options, in my view (and is no less permanent). If you use something like AutoKey, a desktop automation and text expansion tool, then often-used code excerpts and commands like this become even more trivial anyhow.

Back on topic: now you can save files to (or get files from) Google Drive. Easy, so long as you are comfortable in the terminal... which you should be anyways. Either work with the data file where it is or move or copy it up a few directory levels to the Colab VM root. Since it disappears from here after dropping your instance, however, it makes more sense to me to just read the data from where it sits in the CSV file in the filesystem:

# import pandas as pd
titanic_train = pd.read_csv('/content/gdrive/My Drive/Colab Notebooks/datasets/titanic/train_clean.csv')

3. Use custom libraries and modules stored in Google Drive

So, what if you have custom Python libraries or modules you want to import into Colab projects?

For example, I have a folder I called 'my_modules' in my Colab directory I store common .py files I want access to in Colab. I don't want to store them on GitHub, and they aren't clean files I want to otherwise have to make clean and share with others. Let's say they are simple a collection of helper modules I have gotten used to using myself; data loader functions, data cleaning functions, and the like.

Any files like this I store in a Dropbox folder with the same name which I sync with the Google Drive version. This way I can use the file both inside Colab and outside. I have direct OS access to Google Drive contents — natively on Chrome, and via ocamlfuse on Ubuntu — and can make use of them starting with the same Google Drive filesystem access trick in the above point.

This snippet is useful. Say I have a module called naive_sharding.py in my my_modules directory. Since it is a number of directory levels down, here is the easiest way for me to leave the file where it is and import it within Colab:

import sys
sys.path.insert(0, '/content/gdrive/My Drive/Colab Notebooks/my_modules')

import naive_sharding

That's it; the naive_sharding.py module has been imported, and is ready to use.

Modifying some of the above code snippets, you can see how easy it would be to get, say, model weights into and out of a Colab environment quite trivially. And so with the short notes above, and an outside-the-box thinking, you can accomplish quite a bit with Google Colab, despite what numerous naysayers might have you believe. Given there is zero setup, and Chromebook access is likewise trivial, I have recently found Colab to be an ideal coding tool.

Don't forget to read the first 3 tips and tricks found here.

Related:

3 More Google Colab Environment Management Tips

0. Get Colab out of the browser

1. Download a file to local computer

1(b). Display an image inline

2. Access your Google Drive filesystem

3. Use custom libraries and modules stored in Google Drive

More On This Topic

Latest Posts

Top Posts