Explain Codes LogoExplain Codes Logo

Apply pandas function to column to create multiple new columns?

python
pandas
dataframe
vectorized-solutions
Anton ShumikhinbyAnton Shumikhin·Dec 15, 2024
TLDR

Creating multiple new columns from one is a piece of cake using the .apply() and a function that returns a pd.Series.

df[['B', 'C']] = df['A'].apply(lambda x: pd.Series([x * 2, x * 3]))

The code above will simply take the values in 'A', double and triple them, then populate the 'B' and 'C' columns.

Understand and apply the .apply() method

The apply() method allows you to manipulate a DataFrame's values in bulk. If you want to return multiple columns, combine apply() with result_type='expand' for direct return without extra steps.

# Here's some magic: function to multiply values def multiply(row): return row * 2, row * 3 # Abracadabra: applying the function and creating new columns df[['B', 'C']] = df.apply(lambda row: multiply(row['A']), axis=1, result_type='expand')

Note: This method is faster than using df.iterrows(). Remember, no one likes slow code. Not even your computer 🐌.

Map and zip for data transformation

For large data, speed and memory matters. Combine map() and zip() for a quick and efficient data transformation.

# Quick and efficient transformation df['A_double'], df['A_triple'] = zip(*df['A'].map(lambda x: (x * 2, x * 3)))

Building functions with lambdas and dictionaries

Streamline your Python code using lambda functions! Use apply() with a lambda function that returns a dictionary to name new columns dynamically.

# Turning lambda and dictionary into a dynamic duo df = df.join(df['A'].apply(lambda x: {'B': x * 2, 'C': x * 3}))

Memory usage and function application

Keep track of your memory usage. df.apply() is known to be memory-intensive, and can slow down performance with large DataFrames. Always opt for vectorized solutions when available.

Steer clear of .map(lambda ...) calls if there are efficient pandas methods available. And yes, staying updated with pandas enhancements will do you good!

Merging like a pro

Merge, don't mangle! When dealing with different transformations, consider using df.merge. It allows you to join your original DataFrame with additional transformations, keeping your data tight and tidy.

# Compute the additional columns add_data = df['A'].apply(lambda x: pd.Series({'B': x * 2, 'C': x * 3})) # Merge original DataFrame with the additional_data DataFrame df = df.merge(add_data, left_index=True, right_index=True)