Explain Codes LogoExplain Codes Logo

How do I use threading in Python?

python
multiprocessing
threading
concurrency
Anton ShumikhinbyAnton Shumikhin·Nov 6, 2024
TLDR

Employ Python's built-in threading module by extending the Thread class and then defining the run method. This method contains the code which will be executed in parallel. Start the thread using the start() method. An illustration is as follows:

import threading class MyThread(threading.Thread): def run(self): print('Parallel execution underway') thread = MyThread() thread.start() thread.join() # Wait for the thread to finish before the main program exits

This code snippet creates a basic thread to print "Parallel execution underway". join() ensures the main program waits for the thread to finish.

An overview of threads and multiprocessing.dummy

Threads in Python are optimal for use in I/O-bound tasks; situations where the system spends a large proportion of its time waiting for external resources or inputs. Python’s multiprocessing.dummy.Pool is powerful and provides a pool of threads for executing function calls in parallel:

from multiprocessing.dummy import Pool def fetch_url(url): # Pretend that reading HTML is hard work and takes a lot of time # ... pool = Pool(5) # Setting up a thread pool with 5 threads results = pool.map(fetch_url, list_of_urls) pool.close() pool.join() # No new tasks after this point

For CPU-bound tasks, tasks restricted by your CPU's processing power, go with the multiprocessing module to take advantage of process-based parallelism.

Starting and maintaining threads

The threading.Thread() function is an alternative way to create threads by directly using, rather than subclassing. Here's one way to do it:

def cook_pasta(): print('Processing carbs...') chef = threading.Thread(target=cook_pasta) chef.start() chef.join() # Wait until our chef finishes cooking

You can prevent threads from blocking the main program from exiting using t.daemon = True to create daemon threads.

Synchronizing threads

Python implements the Queue.Queue() class for secure data sharing and synchronization among threads. Managing shared resources among threads prevents race conditions:

from queue import Queue from threading import Thread def worker(queue): while True: item = queue.get() # Process the item ... or you could just admire it for a while queue.task_done() queue = Queue() for _ in range(4): # Creating 4 workers t = Thread(target=worker, args=(queue,)) t.daemon = True t.start() # Queue your heart out for item in items_to_process: queue.put(item) queue.join() # Wait until all items have been processed

Working around Python's GIL

Python’s GIL (Global Interpreter Lock) can be problematic for multi-threaded CPU-bound programs because it only allows one thread to control the Python interpreter at one time. This is why understanding the differentiation between multithreading and multiprocessing is crucial. In the context of CPU-heavy tasks, choose multiprocessing to bypass the GIL.

When to use threads

Go for threads for I/O-bound or network-bound operations. An example is making HTTP requests using .urlopen():

import threading import urllib.request def download_image(url): print('Commencing download', url) urllib.request.urlretrieve(url, 'photo.jpg') thread = threading.Thread(target=download_image, args=('http://example.com/photo.jpg',)) thread.start() thread.join() # Wait for download to complete before moving on

Threads with multiple arguments

From Python 3.3 onwards, multiprocessing.Pool.starmap() helps deal with multiple arguments for parallel tasks:

from multiprocessing.dummy import Pool as ThreadPool def prepare_dish(ingredient, dish): print(f'Using {ingredient} to make {dish}') pool = ThreadPool(3) # Instantiating a thread pool with 3 threads ingredients = ['🍅', '🥬', '🌽'] dishes = ['🍝', '🥗', '🍲'] pool.starmap(prepare_dish, zip(ingredients, dishes)) pool.close() pool.join()

Including itertools.repeat() allows for repeating constants:

from itertools import repeat pool.starmap(prepare_dish, zip(repeat('🧂'), dishes)) # We all like some salt, right?

Thread synchronization methods

Python provides advanced thread synchronization tools, including semaphores, locks, and events. They can manage complex inter-thread conduct and prevent threads from colliding.

Considerations in multithreading

Python threads may decrease performance in CPU-bound tasks due to context switches. Processes often yield better results for non I/O-bound tasks.

Safety precautions

When using multiprocessing, it's usual to wrap your starting code in if __name__ == '__main__': to avoid unintended behavior.