Explain Codes LogoExplain Codes Logo

How to use multiprocessing pool.map with multiple arguments

python
multiprocessing
pool
map
Alex KataevbyAlex Kataev·Feb 22, 2025
TLDR

Applying functools.partial for constant second arguments:

from multiprocessing import Pool from functools import partial # Your maths teacher would be proud, we're adding! def add(x, y): return x + y pool = Pool() fixed_y = 4 # Because, why not '4'? # functools.partial: a swiss army knife for fixed arguments sum_with_fixed_y = partial(add, y=fixed_y) results = pool.map(sum_with_fixed_y, [1, 2, 3])

For flexible pairs, use pool.starmap():

from multiprocessing import Pool def add(x, y): return x + y # Still don't need a calculator! pool = Pool() results = pool.starmap(add, [(1, 4), (2, 5), (3, 6)]) # Fancy iterator used here

Both examples invoke the add function concurrently across input arguments and return the sum.

Taking the dive: practical insights

multiprocessing.Pool.map typically limits you to passing a singular iterable of parameters. To sneak in multiple arguments, a few tricks to transform your code.

Introducing functools.partial to the game

For moments when several of the inputs remain constant, functools.partial leads the charge:

from functools import partial # ... previous setup # No change in `y`, hence its our `functools.partial` candidate sum_with_fixed_y = partial(add, y=fixed_y) results = pool.map(sum_with_fixed_y, range(10)) # Calculating, but with style!

The art of using a wrapper function

If constant values are not your style, a wrapper function separates a tuple into individual parameters:

# Working smart, not hard! def wrapper(args_tuple): return add(*args_tuple) results = pool.map(wrapper, [(1, 4), (2, 5), (3, 6)])

Python version busts and bugs

Older Python versions, particularly Python 2.6, released a bug for functools.partial. In these cases, fallback to the wrapper function magic!

Joining itertools.repeat() to the fun

For repeating a single argument across several executions, itertools.repeat() is your charm charm:

from itertools import repeat # Automated repetitions yet no deja vu here for fixed_y, x in zip(repeat(4), [1, 2, 3]): pool.apply_async(add, (x, fixed_y))

Resource Management etiquette

To ensure your pool is properly closed post-usage, always enclose Pool resources within a context manager:

with Pool() as pool: # ... your operations # Feel free to add your own magic here

Embracing ThreadPool for IO Operations

Python 3.3 introduced ThreadPool equipped with an API similar to Pool but preferred for tasks I/O-bound:

from multiprocessing.pool import ThreadPool with ThreadPool() as pool: results = pool.map(some_io_bound_task, [(arg1, arg2), ...])

Safety precautions!

Ensure the safety of your multiprocessing code by defining target functions outside the if __name__ == "__main__" guard. It helps prevent recursive spawning of subprocesses:

# Hide & Seek inside `if` clause def target_function(arg1, arg2): # ... functionality if __name__ == "__main__": with Pool() as pool: pool.map(target_function_wrapper, arg_list)

Seizing the Results

Won't forget about the results, would we? pool.starmap() returns an array of outputs. Capture and utilize away:

results = pool.starmap(target_function, arg_list) for output in results: # Time to unravel the results! print(output)

Get your hands dirty: practical use-cases

Here are a few classic scenarios for dealing with multiple arguments in multiprocessing:

Batch processing

When dealing with a complex function allocated to a large dataset with several constant configuration parameters, functools.partial can save the day:

with Pool() as pool: # functools.partial to the rescue for static configurations results = pool.map(partial(complex_function, config1, config2), data_batch)

Parallel I/O operations

In scenarios involving I/O-bound tasks such as downloading files simultaneously, a ThreadPool acts as the saviour. Pass URLs and destination paths as arguments:

with ThreadPool() as pool: # Say goodbye to sequential downloads pool.starmap(download_file, [(url1, dest1), (url2, dest2)])

Data-reliant executions

Suppose you have a list of records where a data enrichment function requiring extra context is at play:

with Pool() as pool: records = [(record, enrichment_data) for record in original_records] # Making records smarter, one at a time enriched_records = pool.starmap(enrich_record, records)