Explain Codes LogoExplain Codes Logo

Combine two columns of text in pandas dataframe

python
pandas
dataframe
string-formatting
Anton ShumikhinbyAnton Shumikhin·Aug 31, 2024
TLDR

For seamlessly combining two text columns in a pandas DataFrame, utilize the + operator after validating that both are strings:

# Play nice you two, it's time to become one df['combined'] = df['column1'].astype(str) + df['column2'].astype(str)

To sandwich a separator such as a space between the columns:

# Don't forget your personal space! df['combined'] = df['column1'].astype(str) + ' ' + df['column2'].astype(str)

Boost performance on substantial datasets using a lambda function:

# Are you ready to join the lambda function party? df['combined'] = df.apply(lambda row: str(row['column1']) + str(row['column2']), axis=1)

These strategies offer quick, easy-to-implement solutions to blend columns with or without separators for varying performance requirements.

Diving deeper

More than just appending: str.cat()

When you're not merely gluing series or columns together, str.cat() steps in with extra functionality, even addressing null values. To concatenate with a custom separator and handle missing data:

# str.cat() at your service, tackling null like they're nothing df['combined'] = df['column1'].str.cat(df['column2'].astype(str), sep=' - ', na_rep='Missing')

Multi-column join with agg

End up with more than two columns to merge? agg got your back:

# agg: Because why stop at two? df['combined'] = df[['column1', 'column2', 'column3']].agg(' - '.join, axis=1)

Flexibility in formation with apply

The apply method delivers with robust concatenation. It's particularly valuable when you need to construct your combined string conditionally:

# Keep calm and apply(lambda x) on df['combined'] = df.apply(lambda x: f"{x['column1']} ({x['column2']})", axis=1)

Blow speed barriers with list comprehension

A list comprehension can often be the fastest way to concatenate columns, especially when you're dealing with large DataFrames:

# Too fast? Nope, just fast enough! df['combined'] = [str(a) + ' ' + str(b) for a, b in zip(df['column1'], df['column2'])]

Pitfalls to avoid

Integers in disguise

All columns you're merging should be strings. If they're numeric or of other types, cast them using .astype(str) before handshaking.

Consider your hardware's feelings

Apply apply with care. Its row-wise operation can squeak a little with larger DataFrames, where str.cat() or list comprehensions can offer compier journey.

Lost in translation

Be attentive of null values or data type mismatch when marrying columns — make sure all values are converted to strings and nulls are handled gracefully.

Stay trendy with your pandas and numpy versions to take full advantage of any performance improvements in recent adaptations.

Practical application

Match method to your dataset

Performance requirements differ, ensure to adapt the method to the dimension and nature of your dataset. Remember, not every hammer sees every problem as a nail.

Playing well with others

Leverage the 'others' parameter of .str.cat() to concatenate a series with every component in another series, DataFrame, or list:

df['combined'] = df['column1'].str.cat(['suffix1', 'suffix2', 'suffix3'], sep='-')

Customize your blends with formatting

Come up with personalized combined text with string formatting:

df['combined'] = df.apply(lambda x: f"{x['name'].title()}: {x['description'].lower()}", axis=1)

Wise slicing: diving deeper into the DataFrame

Remember the power of column slicing, df.iloc[:, [0, 2]] gets you concatenation of the first and third columns only:

df['combined'] = df.iloc[:, [0, 2]].astype(str).agg(' '.join, axis=1)