How to replace text in a string column of a Pandas dataframe?

python

pandas

dataframe

regex

byNikita Barsukov·Jan 10, 2025

Cut to the chase. Use str.replace to replace text in a Pandas DataFrame column:

df['column'] = df['column'].str.replace('old', 'new')

This changes 'old' to 'new' in 'column'. Always assign it back to the dataframe for the changes to apply in-place, sans the inplace=True parameter, and save yourself from data nightmares.

Understanding String Replacement Tools

Select the appropriate method, like picking your favourite tool from the toolbox for varying complexity:

For straightforward replacement: .str.replace('find', 'replace')
When regular expressions come into play: .str.replace(r'regex-pattern', 'replace', regex=True)
For dynamic or conditional replacements: .apply(lambda x: ...)

Getting Hands-on with Regular Expressions

For complex string patterns, see the regex parameter in action:

import re
pattern = re.escape('old_string') + r'\b'   # Whole word matching
df['column'] = df['column'].str.replace(pattern, 'new_string', regex=True)

Using re.escape shuns unwanted special characters from crashing the regex party. Always validate your regex to fend off surprises.

The Face-off: Vectorized operations vs. row-wise applications

Vectorized str methods kick off quick operations:

df['column'].str.replace('find', 'replace')

Comment: You are not slow, it's just that vectorization is faster 🚀

For more control, use the apply with lambda functions for row-wise surgery:

df['column'] = df.apply(lambda row: row['column'].replace('find', 'replace') if condition else row['column'], axis=1)

Steer Clear of Partial Match Accidents

Establish exact boundaries in regex patterns to thwart partial replacement mishaps:

df['column'].str.replace(r'\bold\b', 'new', regex=True)

This ensures the full word "old" is replaced, avoiding awkward partial replacements.

The Fine Art of Crafting Regex Patterns

Craft and combine regex patterns for flexible replacements:

df['column'].str.replace('|'.join([re.escape(w) for w in word_list]), 'replace', regex=True)

Treating each word in word_list to a neat regex blend, broad spectrum replacements become a piece of cake.

Rejuvenate Your Data!

Transform and clean your data with integrated regex within Pandas:

df['column'] = df['column'].str.replace(r'[^\w\s]', '', regex=True) # Cleanses data of pesky punctuation

One swift move, and your data is punctuation-free!

Dive Deeper with Further Learning

Master string operations with the official Pandas documentation, and trusty Python manuals. Remember, the keys to wisdom lie in exploration.

explain-codes / Python / How to replace text in a string column of a Pandas dataframe?

Linked

Replacing blank values (white space) with NaN in pandas



Drop columns whose name contains a specific string from pandas DataFrame



How to use string.replace() in python 3.x



Remove unwanted parts from strings in a column



Filter pandas DataFrame by substring criteria



Remove all occurrences of char from string



Python string.replace regular expression



Understanding String Replacement Tools Getting Hands-on with Regular Expressions The Face-off: Vectorized operations vs. row-wise applications Steer Clear of Partial Match Accidents The Fine Art of Crafting Regex Patterns Rejuvenate Your Data!Dive Deeper with Further Learning