Open In App

How to Read File Content from S3 Bucket with Boto3 ?

AWS S3 (Simple Storage Service), a scalable and secure object storage service, is often the go-to solution for storing and retrieving any amount of data, at any time, from anywhere. Boto3 is the AWS Software Development Kit (SDK) for Python, which provides an object-oriented API for AWS infrastructure services. It allows Python developers to build applications on top of Amazon services.

Prerequisites

Step-By-Step Guide to Read Files Content from S3 Bucket

Steps to Create S3 Buckets and Upload Files and Folders

Step 1: Login into the AWS console.



Step 2: After signing in, you will land on the AWS Management Console page and search for S3 as shown below.

AWS Management Console

Step 3: From the sidebar go to Buckets. Click on Create bucket this will create a bucket.



Create Bucket Form

Step 4: Enter your bucket name, make sure your bucket name is unique.

Bucket name

Step 5: Create bucket, no need to change other things keep them as it is.

Create Bucket

Step 6: Goto buckets page again, this will list all buckets.

Buckets

Step 7: We will upload and read files from ‘gfg-s3-test-bucket‘. Open your bucket.

Open Bucket

Step 8: Click on the Upload button. You can also Create Folder inside buckthe et. Select Add File/ Folder to add them.

Upload Files/Folders

Step 9: Verify if files/folders added properly or not, then Upload it.

Test.txt is running
GFG Test
Test1.txt is running
Reading contents from file using boto3

Verify

Step 10: All the files are uploaded successfully, now we can start reading those using Boto3.

Upload Successfully

Step to Read Files or Folders using Boto3

Step 1: Import all the necessary libraries, we use dotenv to access environment variables and load them.

import os

import boto3
from dotenv import load_dotenv

Step 2: Create an S3 Client that provides all the necessary methods to work with the S3 bucket. Provide Access key and Secret Access Key using os.

# Create S3 client
s3 = boto3.client(
"s3",
aws_access_key_id=os.getenv("ac_key"),
aws_secret_access_key=os.getenv("sac_key"),
)

Step 3: Store the bucket name in a variable.

# Store bucket name
bucket_name = "gfg-s3-test-bucket"

Step 4: Make a list of all objects in a bucket using the list_objects_v2() method and get all the content or metadata of objects.

# Store contents of bucket
objects_list = s3.list_objects_v2(Bucket=bucket_name).get("Contents")

Step 5: Iterate over a list of objects.

# Iterate over every object in bucket
for obj in objects_list:

Step 6: Store the object name using the ‘Key’ attribute in the object contents.

    #  Store object name
obj_name = obj["Key"]

Step 7: Fetch and store all contents of an object using get_object(), which takes the bucket name and key or object name resulting in the dictionary.

    # Read an object from the bucket
response = s3.get_object(Bucket=bucket_name, Key=obj_name)

Step 8: Read object data from the body attribute of the response and decode it.

    # Read the object’s content as text
object_content = response["Body"].read().decode("utf-8")

Step 9: Finally print all the contents of the respective file.

    # Print all the contents
print(f"Contents of {obj_name}\n--------------")
print(object_content, end="\n\n")

Here is the complete code for Read file content from S3 bucket with boto3

This Python script uses the Boto3 library to interact with AWS S3. It first loads AWS credentials from environment variables using the dotenv module. Then, it creates an S3 client using these credentials. The script lists all objects in a specific S3 bucket, retrieves each object’s content, and prints it to the console. Finally, it decodes the content from bytes to a readable string using UTF-8 encoding.

import os

import boto3
from dotenv import load_dotenv

# Load Environment Variables
load_dotenv()

# Create S3 client
s3 = boto3.client(
"s3",
aws_access_key_id=os.getenv("ac_key"),
aws_secret_access_key=os.getenv("sac_key"),
)
# Store bucket name
bucket_name = "gfg-s3-test-bucket"

# Store contents of bucket
objects_list = s3.list_objects_v2(Bucket=bucket_name).get("Contents")

# Iterate over every object in bucket
for obj in objects_list:
# Store object name
obj_name = obj["Key"]
# Read an object from the bucket
response = s3.get_object(Bucket=bucket_name, Key=obj_name)
# Read the object’s content as text
object_content = response["Body"].read().decode("utf-8")
# Print all the contents
print(f"Contents of {obj_name}\n--------------")
print(object_content, end="\n\n")

Output:

Final Output

Contents of Test.txt
--------------
Test.txt is running
GFG Test

Contents of Test1.txt
--------------
Test1.txt is running
Reading contents from file using boto3

Conclusion

Reading files from an AWS S3 bucket using Python and Boto3 is straightforward. With just a few lines of code, you can retrieve and work with data stored in S3, making it an invaluable tool for data scientists working with large datasets.

Read File Content From S3 Bucket With Boto3 – FAQ’s

How do I secure my AWS credentials in Python?

Use environment variables, AWS profiles, or IAM roles for secure storage

How can I read a specific file from an S3 bucket?

Don’t list all the contents of bucket and remove loop from above code .

How can I read data into a pandas DataFrame (CSV/JSON)?

After geting object use read_csv() or read_json() methods from pandas module.

Is it possible to download files from S3 to my local machine?

Yes, you can use s3_client.download_file() method.

Getting error that you have no access

Goto IAM and add policies or permissions.


Article Tags :