Explain Codes LogoExplain Codes Logo

Get human readable version of file size?

python
functions
performance
best-practices
Anton ShumikhinbyAnton Shumikhin·Oct 17, 2024
TLDR

Python recipe for converting file size to a human-readable format:

def human_size(size): # Define units like the Barenaked Ladies... it's been one week, folks! units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB'] for unit in units: if size < 1024: return f"{size:.1f} {unit}" size /= 1024 print(human_size(1536)) # "1.5 KB"

This function chews through units, dividing the size until it's smaller than 1024, then pretty-prints it with one decimal place.

Deep dive: Binary vs. Decimal, big file sizes, and negative numbers

The first version works like a charm for casual conversions, but it's time to delve deeper. We need to take care of negative numbers (which might hint at over-committed storage spaces or errors) and extremely large sizes. Also, we should address the difference between binary and decimal prefixes for file size units.

Here's an amped-up function for such intricacies:

import math def human_size(size, decimal_places=1): # Defining unit prefixes in the binary system (power of 1024) units = ['B', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB'] if size < 0: sign = "-" size = -size else: sign = "" if size < 1: return f"{sign}0 B" # Logarithm magic to find the appropriate unit i = int(math.log(size, 1024)) # Scaling size down p = math.pow(1024, i) s = round(size / p, decimal_places) return f"{sign}{s} {units[i]}" print(human_size(1536)) # "1.5 KiB" print(human_size(5294967296)) # "4.9 GiB" print(human_size(-1024)) # "-1.0 KiB" print(human_size(1234567890000000000000)) # "1.1 YiB" – Take that, Universe!

This version has got you covered from single bytes to yottabytes – That's, like, a lot of cat videos!

Exception handling: Zero and single bytes

For file sizes less than 1 byte or specifically zero, we give them special treatment, so they don't feel left out.

def human_size(size, decimal_places=1): # ... [abbreviated for brevity] if size == 0: return "0 B" elif size == 1: return "1 Byte" # Because grammar matters… Yes, it does… # ... [rest of the function]

Smarter work: Leveraging libraries

In some cases, using an existing library, like humanize, can make your life a whole lot easier:

from humanize import naturalsize print(naturalsize(1536)) # "1.5 KB" print(naturalsize(1048576, binary=True)) # "1.0 MiB" – Doesn't get easier!

It provides the values in both binary and decimal units without having you to write additional lines of code.

Trick-or-treat: Bitwise operations and Recursion

Super-efficient bitwise operations

For better performance with large files, bit-level operations can divide numbers faster than traditional methods:

def human_size(size, decimal_places=1): # ... [abbreviated for brevity] # Ghost in the machine! Don't be scared, it's just binary logic. i = (size.bit_length() - 1) // 10 p = 1 << (i * 10) # ... [rest of the function]

This version utilizes bit_length() and bit shifting, taking advantage of the binary nature of the file sizes.

Elegance of recursions

A recursive design simplifies the iterations:

def human_size_recursive(size, units=None, step=0): if units is None: # Like a recurring dream, but it's really handy here. units = ['B', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB'] if size < 1024 or step == len(units) - 1: return f"{size:.{decimal_places}f} {units[step]}" return human_size_recursive(size / 1024, units, step + 1)

Recursion can make small byte-sized pieces out of the beastliest file sizes – much easier to read, too!

Practical applications

User interface: File size labels

A file size function can be integrated with actual files for providing meaningful information to users:

def file_size_label(file_path): size = os.path.getsize(file_path) # Getting file size # Applying conversion return f"The file size is: {human_size(size)}" print(file_size_label("/path/to/your/file.txt")) # "The file size is: 5.2 MB"

Tailored formatting

You can offer customizable options for unit abbreviations or decimal places to align with design or user requirements:

def human_size(size, decimal_places=1, abbreviate=True): # ... [abbreviated for brevity] unit = units[i] if abbreviate: unit = unit[0] if unit != "B" else unit # 'K' for 'KiB' # Serving the size, just the way you like it. return f"{sign}{s} {unit}" print(human_size(1048576, decimal_places=2, abbreviate=False)) # "1.00 MiB" print(human_size(1048576, abbreviate=True)) # "1.0 M"