If you're looking for an efficient method to calculate directory size in Python, os.walk() and os.path.getsize() combo is the way to go:
import os
defget_dir_size(path):# Doing some digital cardio here (walking around your directory)returnsum(os.path.getsize(os.path.join(dp, f)) for dp, dn, fn in os.walk(path) for f in fn if os.path.isfile(os.path.join(dp, f)))
# Example usageprint(f"Directory size: {get_dir_size('/your/directory')} bytes") # When size matters
Leveraging Python's modern tools
A major performance boost can be achieved using os.scandir() and entry.stat().st_size, which is, hands down, the best power couple since peanut butter met jelly:
import os
defget_dir_size_fast(path): total_size = 0with os.scandir(path) as it:
for entry in it:
if entry.is_file(follow_symlinks=False):
# Alright, show off your size! total_size += entry.stat(follow_symlinks=False).st_size
return total_size
# Example usageprint(f"Directory size: {get_dir_size_fast('/your/directory')} bytes") # Faster than a stolen Ferrari
Dealing with symbolic links
Be wary of symbolic links, they're like that friend who makes lame copies of your jokes. They could lead to duplicated file counting or even infinite recursion:
defcalculate_directory_size_no_links(path): total_size = 0for dirpath, dirnames, filenames in os.walk(path, followlinks=False):
for f in filenames:
fp = os.path.join(dirpath, f)
ifnot os.path.islink(fp): # Sorry, we don't do photocopies total_size += os.path.getsize(fp)
return total_size
Friendly size format
Counts in bytes can cause an information overload. Let's make file sizes more readable:
defhuman_readable_size(size):# Join me in the showers. It's not rude, it's unit conversion! for unit in ['bytes', 'KB', 'MB', 'GB', 'TB', 'PB']:
if size < 1024:
returnf"{size:.2f}{unit}" size /= 1024returnf"{size:.2f} PB"# "Petabytes" sounds cute, but you don't wanna meet them in a dark alley.# Example usage for human-readable formatsize_in_bytes = get_dir_size('/your/directory')
print(f"Directory size: {human_readable_size(size_in_bytes)}") # Now, in baby language
Python's best-kept secrets
Embracing the 'pathlib' module
The pathlib module makes directory size calculation a walk in the park:
from pathlib import Path
defget_dir_size_pathlib(path):# pathlib a day keeps the terminal awayreturnsum(f.stat().st_size for f in Path(path).rglob('*') if f.is_file())
# Example usageprint(f"Directory size: {get_dir_size_pathlib('/your/directory')} bytes") # It's Py-magic
The outfit change for output
Some situations call for different units of measurement. Here's how you can easily alter your function's output:
classDirectorySizer:# Directory Sizer: in the end, Size does matterdef__init__(self, path): self._bytes = get_dir_size_pathlib(path)
@propertydefkilobytes(self):# Megabytes are overrated return self._bytes / 1024 @propertydefmegabytes(self):# Who's the big boy now?return self._bytes / 1024**2# ...include more units as deemed fit# Example usagesizer = DirectorySizer('/your/directory')
print(f"Directory size: {sizer.megabytes} MB") # Megabytes, the absolute unit
Avoiding os-specific commands
One might seek a quick fix with du -sh using the subprocess module, but I must resist:
import subprocess
defget_size_with_du(path): result = subprocess.check_output(['du', '-sh', path]).split()[0].decode('utf-8')
return result
# Example usageprint(f"Directory size: {get_size_with_du('/your/directory')}") # Runs faster but remember it requires Linux
Remember, it's important to pursue cross-platform compatibility. This method falls short as it relies on Unix commands and might not function on Windows.