Check if string ends with one of the strings from a list

python

string-ends-with

case-insensitive-check

regular-expressions

byAlex Kataev·Aug 22, 2024

Check whether a string concludes with a given choice of substrings using str.endswith(), which is capable of accepting a tuple:

choices = ('.txt', '.doc', '.pdf')
file_name = 'report.doc'
if file_name.endswith(choices):
    print('Alive and kicking, document format!')

This efficiently juxtaposes the file_name with a tuple of choices, confirming it as a valid document format.

Expanded answer

Case-insensitive check

For a case-insensitive check:

choices = ('.txt', '.doc', '.pdf')
file_name = 'report.DOC'
if file_name.lower().endswith(choices):
    print('Have no fear, the document format is here!')

By invoking lower() on file_name, we ensure the matching operation remains case insensitive.

Regular expressions for intricate patterns

If you've got some fancy footwork in your patterns, regex is your go-to dance partner:

import re

choices = ['\.txt$', '\.doc$', '\.pdf$']
file_name = 'archive.pdf'

if any(re.search(pattern, file_name, re.IGNORECASE) for pattern in choices):
    print('Cut the check, this is a valid document!')

The re.IGNORECASE option enables case-insensitive searches. Regex handles more convoluted patterns like a champ.

Code performance with `timeit`

Is your code more Usain Bolt or more tortoise? Use timeit to measure performance:

import timeit

# Testing the speed of our methods, no steroids involved!
timeit.timeit("file_name.lower().endswith(choices)", setup="file_name = 'example.DOC'; choices = ('.txt', '.doc', '.pdf')", number=10000)
timeit.timeit("any(re.search(pattern, file_name, re.IGNORECASE) for pattern in choices)", setup="import re; file_name = 'example.DOC'; choices = ['\.txt$', '\.doc$', '\.pdf$']", number=10000)

Opt for the most effective method, balancing performance and code readability.

Optimization for multiple checks

If you've got a bunch of strings to check - no, we're not at a puppet show - you can optimize:

def is_valid_format(file_name, extensions):
    return file_name.lower().endswith(tuple(extensions))

file_list = ['lord_of_the_rings.doc', 'harry_potter.jpg', 'game_of_thrones.pdf']
valid_formats = ('.txt', '.doc', '.pdf')

valid_documents = filter(lambda f: is_valid_format(f, valid_formats), file_list)
print(list(valid_documents))

The filter function paired with a lambda gives us a list of matching string endings faster than you can say "Expecto Patronum!".

Beyond the basics

Splitting file name and extension

Separating the file name and extension can be efficiently done using os.path.splitext:

import os

file_name = 'example.doc'
root, ext = os.path.splitext(file_name)

if ext in choices:
    print('Goal! Valid document format.')

This method ensures accurate extraction of extensions - like a pro!

Pythonic code applications

Pythonic code is all about clean and concise expressions. Swap the old for loop with:

# List comprehension. Because nobody likes the guy who brings an essay to a bullet-point fight!
is_valid = any(file_name.lower().endswith(ext) for ext in choices)

Swapping for loops with list comprehension keeps the syntax neat and tidy, and your code reviewers happy.

explain-codes / Python / Check if string ends with one of the strings from a list

Linked

How can I check the extension of a file?



Check if string matches pattern



Search for "does-not-contain" on a DataFrame in pandas



How to find MIME types in Python?



Case insensitive regular expression without re.compile?