Save Dataframe to csv directly to s3 Python

python

dataframe

s3fs

pandas

byAnton Shumikhin·Mar 5, 2025

Execute DataFrame to CSV conversion to S3 in a flash using to_csv() and boto3 like so:

# Importing necessary modules
import pandas as pd
import boto3
from io import StringIO

# DataFrame instantiation for demonstration
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

# Boto3 S3 client with access key ID and Secret Access Key
s3 = boto3.client('s3', aws_access_key_id='YOUR_KEY', aws_secret_access_key='YOUR_SECRET')

# Creating an in-memory string buffer ready for a vacation at S3
csv_buffer = StringIO()
df.to_csv(csv_buffer)

# Time to turn the key and open the gate to S3's storage
s3.put_object(Bucket='YOUR_BUCKET', Key='your_data.csv', Body=csv_buffer.getvalue())

Verify your AWS credentials are accurate. This code snippet offers the gist of transitioning from DataFrame to S3 with lightning speed.

Ease out with s3fs:

Get a sigh of relief by diverting the headache of manually dealing with StringIO and boto3 to s3fs. It enables you to engage with S3 using traditional filesystem operations, making your S3 interaction as smooth as a sea breeze.

# Import prerequisites
import pandas as pd
import s3fs

# DataFrame instantiation for demonstration
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Create an S3 filesystem object, like creating a magic wand
fs = s3fs.S3FileSystem(anon=False, key='YOUR_KEY', secret='YOUR_SECRET')

# Use 'to_csv' to save DataFrame directly to S3, like waving the wand!
with fs.open('s3://YOUR_BUCKET/YOUR_PATH/your_data.csv', 'w') as f:
    df.to_csv(f)

Growing data size? No worries! s3fs is your knight in shining armor when it comes to handling heavier datasets, as it optimizes memory by writing in small bits - and that's what we call a smart move!

EC2 and IAM optimizations:

Are you running your scripts from an EC2 instance? Go for IAM roles to get the privileges of S3 access without unraveling the intricacies of credentials in your code. This allows a secure and seamless bond between your EC2 instance and S3.

# Simply attach an IAM role with S3 access to your EC2 instance. Voila! No need for any credentials going around in your Python code!

This is the holy grail for production environments, as it neatly sidesteps the pitfalls of hardcoded sensitive details.

Pandas versions compatibility:

Pace in harmony with pandas' updates by always checking the pandas Release Note. Post pandas 0.24+, you can scribble directly to an S3 path within to_csv():

# Send DataFrame to s3; real business, no cutting corners.
df.to_csv('s3://YOUR_BUCKET/YOUR_PATH/your_data.csv', index=False)

CSV specificity:

Hum the tunes of DataFrame tweaks before shipping your data to S3 by using obtained specifics for your CSV. Use index=False to exclude DataFrame indexes from the final CSV or tweak other parameters to nail your data structure for its next destination: S3.

Advanced considerations:

Intelligent writing ways: Strive for Python version compatibility. Python 3 loves the 'w' mode for file opening, while Python 2 favors 'wb':

# mode = 'wait, what?' We get the version confusion, Python!
mode = 'wb' if sys.version_info < (3,) else 'w'
with fs.open(f's3://YOUR_BUCKET/YOUR_PATH/your_data.csv', mode) as f:
    df.to_csv(f)

Pre-upload DataFrame alterations: In need of specific data filters or modifications? Have them dine on your DataFrame prior to export. It makes processes efficient and custom-fit to specific needs:

# Apply your data magic here
df = df[df['A'] > 1]  # Example magic: filter rows
df.to_csv('s3://your_bucket_name/your_filtered_data.csv')

Treat DataFrame as a string: In case you choose to not use to_csv() directly for upload, it may ask of you to convert the DataFrame to a string first:

csv_string = df.to_csv(None)
s3.put_object(Bucket='bucket', Key='key', Body=csv_string)

Success is no accident: Strive for leaner data retreival and uploads. They should be as concise as an expert chef's knife cuts, especially when handling heftier datasets.

explain-codes / Python / Save Dataframe to csv directly to s3 Python

Linked

How to write a file or data to an S3 object using boto3



Check if a key exists in a bucket in S3 using boto3



How to get input file name as column in AWS Athena external tables



Split (explode) pandas dataframe string entry to separate rows



Open S3 object as a string with Boto3

