Explain Codes LogoExplain Codes Logo

Turn a string into a valid filename?

python
filename-sanitization
cross-platform
python-libraries
Anton ShumikhinbyAnton Shumikhin·Oct 2, 2024
TLDR

Use Python's built-in str.translate and str.maketrans functions to remove invalid characters from a filename. Here's a brief example that includes typical filename-safe characters:

import string filename = "example?*file/name.txt" safe_chars = f"-_.() {string.ascii_letters}{string.digits}" clean_filename = filename.translate(str.maketrans('', '', ''.join(set(string.punctuation) - set(safe_chars))))

The clean_filename variable will now hold 'examplefilename.txt', from which all problematic characters have been removed.

Creating valid filenames cross-platform

Leveraging libraries

Use the python-slugify library to sanitize your string efficiently and create a valid filename that works across different operating systems. It requires no great effort and is very effective.

Ensuring alphanumeric characters

Here's how to ensure only alphabets and numbers are present in your filename. Remember the fun mnemonic, "alpha-numeric: a-okay, the rest, no way!"

valid_filename = ''.join(c for c in filename if c.isalnum() or c in safe_chars)

Catering to Windows quirks

Windows filenames cannot contain specific characters and cannot end with a dot (.) or space. So, let's be like Windows users and avoid things that "just don't compute!".

Secure and unique filenames

Using Base64 encoding

If you want unique and system-friendly filenames, base64 encoding might be the best shot for you.

import base64 encoded_name = base64.urlsafe_b64encode(filename.encode()).decode()

Though the resulting name is less human-readable, it is universally compatible and nearly guarantees no file name collisions.

Using Timestamps or UUIDs

Appending a timestamp or a UUID (Universally Unique Identifier) to your filename can prevent any chances of overwriting an existing file. Here's a smooth move to maintain your groove, and not overwrite.

import uuid unique_filename = f"{valid_filename}_{uuid.uuid4()}"

Human-readable filenames

Using whitelist

Here, a whitelist of safe characters is used to maintain a balance between human readability and system compatibility. This whitelist is like our trusted circle of filename friends. Not everyone makes the cut!

whitelist = "-_.() abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

Handling spaces

Spaces in a filename will be encoded as %20 when used in a URL. So, better replace them with underscores, hyphens, or remove them altogether.

Moulding filename with precision and context

Trimming edges

Ensure to strip the starting and trailing whitespace of a filename.

trimmed_filename = safe_filename.strip()

Dealing with different file types

For mp3 and other media files, consider how the filename might be displayed across different media players. While sanitizing, preserve critical information and only replace the rest.