Explain Codes LogoExplain Codes Logo

Calculating a directory's size using Python?

python
prompt-engineering
functions
pathlib
Alex KataevbyAlex Kataev·Jan 25, 2025
TLDR

If you're looking for an efficient method to calculate directory size in Python, os.walk() and os.path.getsize() combo is the way to go:

import os def get_dir_size(path): # Doing some digital cardio here (walking around your directory) return sum(os.path.getsize(os.path.join(dp, f)) for dp, dn, fn in os.walk(path) for f in fn if os.path.isfile(os.path.join(dp, f))) # Example usage print(f"Directory size: {get_dir_size('/your/directory')} bytes") # When size matters

Leveraging Python's modern tools

A major performance boost can be achieved using os.scandir() and entry.stat().st_size, which is, hands down, the best power couple since peanut butter met jelly:

import os def get_dir_size_fast(path): total_size = 0 with os.scandir(path) as it: for entry in it: if entry.is_file(follow_symlinks=False): # Alright, show off your size! total_size += entry.stat(follow_symlinks=False).st_size return total_size # Example usage print(f"Directory size: {get_dir_size_fast('/your/directory')} bytes") # Faster than a stolen Ferrari

Be wary of symbolic links, they're like that friend who makes lame copies of your jokes. They could lead to duplicated file counting or even infinite recursion:

def calculate_directory_size_no_links(path): total_size = 0 for dirpath, dirnames, filenames in os.walk(path, followlinks=False): for f in filenames: fp = os.path.join(dirpath, f) if not os.path.islink(fp): # Sorry, we don't do photocopies total_size += os.path.getsize(fp) return total_size

Friendly size format

Counts in bytes can cause an information overload. Let's make file sizes more readable:

def human_readable_size(size): # Join me in the showers. It's not rude, it's unit conversion! for unit in ['bytes', 'KB', 'MB', 'GB', 'TB', 'PB']: if size < 1024: return f"{size:.2f} {unit}" size /= 1024 return f"{size:.2f} PB" # "Petabytes" sounds cute, but you don't wanna meet them in a dark alley. # Example usage for human-readable format size_in_bytes = get_dir_size('/your/directory') print(f"Directory size: {human_readable_size(size_in_bytes)}") # Now, in baby language

Python's best-kept secrets

Embracing the 'pathlib' module

The pathlib module makes directory size calculation a walk in the park:

from pathlib import Path def get_dir_size_pathlib(path): # pathlib a day keeps the terminal away return sum(f.stat().st_size for f in Path(path).rglob('*') if f.is_file()) # Example usage print(f"Directory size: {get_dir_size_pathlib('/your/directory')} bytes") # It's Py-magic

The outfit change for output

Some situations call for different units of measurement. Here's how you can easily alter your function's output:

class DirectorySizer: # Directory Sizer: in the end, Size does matter def __init__(self, path): self._bytes = get_dir_size_pathlib(path) @property def kilobytes(self): # Megabytes are overrated return self._bytes / 1024 @property def megabytes(self): # Who's the big boy now? return self._bytes / 1024**2 # ...include more units as deemed fit # Example usage sizer = DirectorySizer('/your/directory') print(f"Directory size: {sizer.megabytes} MB") # Megabytes, the absolute unit

Avoiding os-specific commands

One might seek a quick fix with du -sh using the subprocess module, but I must resist:

import subprocess def get_size_with_du(path): result = subprocess.check_output(['du', '-sh', path]).split()[0].decode('utf-8') return result # Example usage print(f"Directory size: {get_size_with_du('/your/directory')}") # Runs faster but remember it requires Linux

Remember, it's important to pursue cross-platform compatibility. This method falls short as it relies on Unix commands and might not function on Windows.