Open In App

How to Load a Dataset From the Google Drive to Google Colab

Last Updated : 17 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Users can code in Python using Google’s free Colab (short for Collaboratory) platform. It is a Google cloud service that uses Jupyter Notebook. With the help of this platform, we can easily and for free train machine learning models in the cloud. With Google Colab, you can use GPU and TPU for nothing, just like your Jupyter Notebook can. Quick installation and real-time sharing of Notebooks amongst users are some benefits of Google Colab. However, loading a dataset from an outside source(here, Google Drive) requires writing a few lines of code. In this article, we will discuss different steps for loading a dataset from Google Drive to Google Colab.

You can load datasets from Google Drive to Google Colab, using the following steps:

Step 1: Mount Google Drive

Using the built-in code cell in Google Colab, you can mount your Google Drive. As a result, you’ll get access to the documents and folders in your Google Drive account.

from google.colab import drive
drive.mount("/content/drive")

Using the mount() function in Google Colab allows any code in the notebook to access any file in Google Drive.

Step 2: Authorisation Access

When you run the above code cell, you will be requested with a prompt asking for permission to grant Google Colab access to your Google Drive files, as shown in the image below.

auth_req-(2)-Geeks For Geeks

After, allowing the permission, you will be re-directed to a page showing email id access, as shown in the image below, which is further followed by an authentication key to be mentioned.

drive-authentication-660- Geeks For Geeks

Step 3: Google Drive Mounted

After performing the step 2, your Google Drive will be mounted, as shown in the image below.

drive-mounted-Geeks For Geeks

Now, you can easily read your dataset file from the Google Drive. But, before this, check for your present working directory using the command

!pwd

pwd stands for print working directory. It is a command that is used in Unix-like operating systems, such as Linux and macOS, to display the current working directory, or the location or working directory in the file system that you are now using in the command line interface.

When you run the pwd command, the entire path to the current directory will be printed to the terminal. This is useful when exploring directories and interacting with files and directories via the command line because it helps you remember where you are in the file system.

!pwd

As shown in the image above, after executing the command in the colab cell, it is said that the current working directory is /content and the drive is mounted at /content/drive. Therefore, one must start from /content/drive, which is the drive, in order to access the dataset.

Step 4: Accessing the dataset

Once step 3 is completed, you can easily navigate to the folder where your dataset is stored. for this, a command will be used called

!ls

ls is a command commonly used in Unix-like operating systems, including Linux and macOS, for listing the files and directories in the current directory (or a specified directory). It provides a way to view the contents of a directory from the command line.

For example, we will use a sales.csv to show the steps:

!ls /content/drive/MyDrive/sales.csv

drive_dataset-(1)-Geeks For Geeks

Here, the sales.csv dataset is located in the folder named MyDrive.

Step 5: Loading Dataset

Now, depending on the structure of your dataset, you can load it into your Colab notebook using Python libraries like Pandas for tabular data or NumPy for arrays.

import pandas as pd
df=pd.read_csv("/content/drive/MyDrive/sales.csv")

pd.read_csv is a function provided by the popular Python library called Pandas. Pandas is commonly used for data manipulation and analysis in data science and data engineering tasks. The pd.read_csv function specifically is used to read data from CSV (Comma-Separated Values) files into a Pandas DataFrame.

Finally, you can now work with the dataset in your Google Colab, similar to as you would have done in any other Python environment.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads