Download file from web in Python 3

python

file-downloads

streaming

binary-data

byAnton Shumikhin·Oct 2, 2024

Here's an elegant method to swiftly download a file in Python 3 using requests.get(). A great "copy, paste, peace out" solution.

import requests

# Replace with the real deal
url, filename = 'http://example.com/file.ext', 'downloaded_file.ext'

# Now sit back and watch Python do its magic
with requests.get(url, stream=True) as r, open(filename, 'wb') as f:
    for chunk in r.iter_content(chunk_size=8192):
        if chunk: # Because we love our memory!
            f.write(chunk)

This script tackles large file downloads via streaming chunks. Ensure to buckle up for a smooth cloud-file-to-desktop transfer!

Python's toolbox for file downloads

Python 3 are not short on tools when it comes to downloading files from the internet. Trust me, this language has got you covered! From easy-to-implement methods for small files to handling gzip compression on-the-fly, and even considering legacy support issues, let's take a dive into this toolbox.

Downloading files that "fit in the pocket": small files

For small files, the whole response can be captured in memory:

import requests

url = 'http://example.com/smallfile.ext'

r = requests.get(url)
with open('smallfile.ext', 'wb') as f:
    f.write(r.content)  # "I'm only one call away..."

Working as a charm for itsy-bitsy files but handle with care for larger files - memory isn't infinite!

The "old is gold" dilemma: urlretrieve vs urlopen

Prefer a piece of Python's heritage? urllib.request.urlretrieve is an old-timer but still gets the job done:

import urllib.request

urlretrieve('http://example.com/legacyfile.ext', 'legacyfile.ext')  # Old but gold

But if you're more inclined to a modern style with deprecation-proof, choose urlopen and shutil.copyfileobj for stream downloads:

import urllib.request
import shutil

with urllib.request.urlopen('http://example.com/file.ext') as response, \
     open('file.ext', 'wb') as out_file:
    shutil.copyfileobj(response, out_file)  # Shutil - the unspoken helper

Catching errors with grace: validation and handling

Because nobody likes failures - at least make them look neat! Here's how to handle any hiccups and do a soundcheck for response status codes:

r = requests.get(url)
if r.status_code == 200:
    with open('errorless_file.ext', 'wb') as f:
        f.write(r.content)
else:
    print("Download failed: status code {}".format(r.status_code)) # You shall not pass!

Handling compressed data on-the-go: gzip decompression

Expecting compressed data from the web? Let Python handle gzip decompression while downloading the file:

import requests
import gzip
from io import BytesIO

response = requests.get(url)
compressed_file = BytesIO(response.content)
decompressed_file = gzip.GzipFile(fileobj=compressed_file)

# As we like to say: deflate, we got your back!
with open('decompressed_file.ext', 'wb') as f:
    shutil.copyfileobj(decompressed_file, f)

Store, decode, and simplify

This section presents a neat collection of more considerations, ensuring your Python downloading skills shine:

Saving to specific paths:

Because organization is key:

import os

# Pretend it's a treasure hunt
target_directory = '/path/to/directory'
filename = os.path.join(target_directory, 'file.ext')

with open(filename, 'wb') as f:
    f.write(r.content)

Binary data and encoding edibles:

Binary data can seem tough, but with Python's decode, it's a piece of cake:

binary_data = r.content
string_data = binary_data.decode('utf-8')  # Binary data walked into a bar, came out as 'utf-8'. What a story!

When simplicity meets efficiency: wget

import wget

wget.download(url, 'simple_download.ext')  # wget - the unsung hero

Visualising the download process

Let's envision the process of downloading a file from the web in Python 3:

Imagine the file is a precious 🎁 hidden inside a 🌐 globe.

Deploy a quest via requests.get('🌐🔗🎁'), which sends a 🚁 to fetch the treasure.
The 🚁 returns with the 🎁 wrapped in r.content.
Unwrap the 🎁 using open('destination_file', 'wb').write(r.content).

The destination_file is your personal shelf to display your 🎁!

Before: 🌐🔗🎁 After: 🖥️💾🎁

With a few lines of code, the internet can become your oyster!

For the curious coder: Advanced considerations

When your needs supersede the basics, these pointers got your back:

Surfing through proxies:

Sometimes, your requests may have to ride on a proxy due to privacy concerns or network regulations:

proxies = {
    'http': 'http://10.10.10.10:8000',
    'https': 'https://10.10.10.10:8000',
}

response = requests.get(url, proxies=proxies)

Conquering large downloads:

For very large files, downloading in chunks and also resuming partially downloaded files can save you time and bandwidth:

chunk_size = 1024  # Bite-sized pieces
offset = os.path.getsize(filename) if os.path.exists(filename) else 0  # Gone today? Here tomorrow!

headers = {'Range': f'bytes={offset}-'}
response = requests.get(url, headers=headers, stream=True)

# Let's pick up where we left off
with open(filename, 'ab') as f:  # 'ab' - append in binary
    f.seek(offset)
    for chunk in response.iter_content(chunk_size=chunk_size):
        if chunk:
            f.write(chunk)

Closing files: clean and respectful

Practicing good housekeeping by gracefully closing your streams and files is Pythonic. Prevent leaks and keep everything tidy!

from contextlib import closing

with closing(requests.get(url, stream=True)) as r, open(filename, 'wb') as f:
    for chunk in r.iter_content(chunk_size=chunk_size):
        if chunk:  # because we care about cleanliness
            f.write(chunk)