Hybrid machine learning solution with Google Colab

Open In Colab

jupyterlabcolabgcp

Google Colab is an excellent free resource provided by Google to use GPUs or TPUs for training your own machine learning models or performing quantitative research. Personally, I have a hybrid solution of performing EDA on the dataset, feature engineering, before transitioning to Google Colab for model selection and hyper parameter tuning.

Google Colab really is a game changer as basically anyone with a cheap laptop and an internet connection can have access to some of the most powerful processors for some heavy duty number-crunching, whether that's optimising a trading model or training a machine learning model to identify images, or performing natural language processing on a ton of websites or documents. Of course, what they are hoping for is that quantitative finance professionals, data scientists, and academics will eventually opt for using the Google Cloud Platform if they need even more processing power. But ultimately, all of these free resources does democratize programming and machine learning etc., so we're all better off for it.

With Colab, you can request GPUs (NVIDIA Tesla K80 GPU) and TPUs and use them for a maximum of 12 hours.

Hybrid cloud workflow for Python

Sharing files between ChromeOS and Crostini (Linux container)

  • Using the Files app of ChromeOS, create a folder called Colab Notebooks. This is where we can store any of our iPython Notebooks and Python scripts (e.g., ipynb, py).
  • Create a subfolder within Colab Notebooks called datasets. This is where our datasets are stored.
  • Create a subfolder within Colab Notebooks called modules. This is where our personal Python modules are stored.
In [ ]:
.
└── Colab Notebooks
    ├── datasets
    └── modules
  • In the Files app, right-click the Colab Notebooks folder and select Share with Linux. You can now access this folder in the /mnt/chromeos/ directory in Crostini.

linux share

  • To stop sharing any folders/files from being shared with Linux, enter the Settings in ChromeOS and select the Manage shared files and folders. You can remove any folders/files that are being shared between both systems from here.

linux_managelinux_manage

Code: Working locally and in Colab

  • GitHub:
    • Code that is stored in a GitHub public repository can be directly pulled.
  • Google Drive:
    • Copy your iPython notebook or Python script into the Colab Notebooks directly.
    • Run the nbconvert command to output Python script into the shared folder as shown below
In [ ]:
!jupyter nbconvert ml_kaggle-home-loan-credit-risk-model-logit.ipynb --to script --output-dir='/mnt/chromeos/GoogleDrive/MyDrive/Colab Notebooks`

Datasets: Working locally and in Colab

As the folder \Colab Notebooks\datasets is in Google Drive, it can be accessed locally on your Chromebook and also from CoLab

Mount Google Drive to your Colab runtime instance

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive

Click the link provided to you, and a new tab will open requesting that you select a Google Account to access Google Drives with.

account

Upon selecting an account, you need to authorize Google Cloud to interface with Google Drives.

authorization

Once you've authorized the action, you will be brought to another page that has a token. This token needs to be copied and pasted in our Colab. At that point, you will be able to access all files in your Google Drive from your runtime instance.

code

Loading your datasets into Google Colab

As the Google Drive is successfully mounted, we go into the directory where our files are stored. In my specific instance, my datasets are stored in Colab Notebooks/datasets/kaggle/, and I am using pickle to extract datasets that I have stored earlier. Obviously you can use pd.read_csv if you have csv files in your Google Drive since it is now mounted on your runtime instance.

In [3]:
import pickleshare

inputDir = "/content/gdrive/My Drive/Colab Notebooks/datasets/kaggle/home-credit-default-risk"
storeDir = inputDir+'/pickleshare'

db = pickleshare.PickleShareDB(storeDir)
print(db.keys())
['df_app_test_align', 'df_app_train_align', 'df_app_train_corr_target', 'df_app_train_align_expert', 'df_app_test_align_expert', 'df_app_train_poly_align', 'df_app_test_poly_align']

Loading your Python modules into your Colab runtime instance

The easiest way is to use the sys module and append the module directory or the directory where all your notebooks and scripts are stored in your Google Drive

In [ ]:
import sys
sys.path.append('/content/gdrive/My Drive/Colab Notebooks/module') # modules
sys.path.append('/content/gdrive/My Drive/Colab Notebooks') # colab

Now you can load in your own modules easily!

In [ ]:
import rand_eda as eda

Conclusion

There you go! Now you know how to have a hybrid workflow to combine working on Python locally on your machine and also on Colab! You've learnt how to perform the following:

  • Share folders/files between Google Drive in ChromeOS and Crostini (Linux Container)
  • Mount Google Drives into your Colab run instance
  • Append a directory that stores your own Python modules into the Python path

Comments

Comments powered by Disqus