How to apply a function to two columns of a Pandas dataframe

python

pandas

dataframe

functions

byAnton Shumikhin·Dec 10, 2024

Speed up your workflow by applying a function to two Pandas dataframe columns 'A' and 'B' like so:

df['result'] = df.apply(lambda row: your_function(row['A'], row['B']), axis=1)

This instructs Pandas to use 'A' and 'B' from each row and saves the your_function output to a new column 'result'.

Efficient implementation of function application

Best coding practices ensure that your function application across multiple dataframe columns is both safe and efficient.

Empower Lambda for custom functions

For execution of dynamic operations over multiple columns, a lambda function is your best friend. With axis=1 set in the apply method, it applies the function row-wise.

Correct column access

To confidently access columns within apply and lambda, especially those with spaces or special characters in their names or colliding with DataFrame attribute names, adhere to the square bracket notation with string column names:

df.apply(lambda row: row['Column A'] + row['Column B'], axis=1)  # As easy as A + B

Exception handling and data type compatibility

Make sure your function can handle different data types gracefully. Also, to catch exceptions and keep your code productive, do something like this:

df.apply(lambda row: try_your_function(row['A'], row['B']), axis=1)  # Try and try again until you...catch an exception

def try_your_function(val1, val2):
    try:
        # Your operation here
        return some_result
    except Exception as e:
        # Process or log the exception
        return np.nan  # When real life gives you an exception, make NaN-aide

Finding an ally in Series.combine

For carrying out more elaborate element-wise operations, Series.combine works wonders. Remember to convert both series to the correct type using astype(object) if needed:

df['new_column'] = df['A'].combine(df['B'], func=your_function)  # Two series, one function, and a whole new column

Optimize through list comprehension

For dealing with larger datasets, implementing list comprehension can work wonders for creating a new column:

df['new_column'] = [your_function(a, b) for a, b in zip(df['A'], df['B'])]  # Zip and unzip the power of list comprehension

Trust but verify your function

Before applying your function to the entire dataframe, a sanity check using sample data should be on your checklist:

sample_result = your_function(df['A'].iloc[0], df['B'].iloc[0]);
print(sample_result)  # Veni, vidi, vici, err… verify!

Building robust and maintainable code

Your solution is more than immediate code; it's about ensuring robustness and maintainability.

Avoiding column name conflicts

Steer clear of numerical indices and use string column names instead, which are safer especially when the data layout might change. Otherwise, you may bang your head against alignment issues.

Consistent return type

Keep the return type of your applied function consistent with the desired type of your new column. It saves you from data type related surprises downstream.

Rigorous Testing

Test the waters (or rather your code) by running rigorous tests on a subset of your data. It's more fun if you think of it as a game.

Clever use of examples

Remember the adage, "A picture is worth a thousand words"? The effect of clear, relatable examples is no different in illustrating your answer.

explain-codes / Python / How to apply a function to two columns of a Pandas dataframe

Linked

Apply pandas function to column to create multiple new columns?



How can I use the apply() function for a single column?



Return multiple columns from pandas apply()



Whether to use apply vs transform on a group object, to subtract two columns and get mean



How to take column-slices of dataframe in pandas



Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?



Set value for particular cell in pandas DataFrame using index



Efficient implementation of function application Building robust and maintainable code