flashcards, multiprocessing in python

The following is a strong simplification. Multiprocessing is complex and frankly boring.

Multithreading vs multiprocessing

Multiprocessing is the ability of a system to run multiple processors in parallel, where each processor can run one or more thread. Multi-threading refers to the ability of a processor to execute multiple threads concurrently. Python multithreading does not work well for CPU bound tasks, but well for I/O tasks due to the way it is implemented.

Embarrassingly parallel functions

If your tasks do not need to communicate between each other (embarrassingly parallel), it would be easiest to use Pool.

{python}from multiprocessing import Pool

Implementation

Parallelised code:

If the results can be computed from partial results from partial inputs, the problem is called embarrassingly parallel
Notice that we added the {python}if __name__ == "__main__":statement. If we didn't do that, each child process would load the file again, and infinitely spawn processes. Yes multiprocessing is weird
from multiprocessing import Pool

# Function to compute the square of a number
def compute_square(num):
    return num * num

# List of numbers to compute squares for
numbers = [1, 2, 3, 4, 5]

def run(numbers):
	# Create a Pool object with 4 processes
	with Pool(processes=4) as pool:
	    # Map the function to the list of numbers
	    results = pool.map(compute_square, numbers)
	
	print("Input Numbers: ", numbers)
	print("Squared Numbers: ", results)

if __name__ == "__main__":
	run(numbers)

Most of the cases you then only need to combine the results somehow.

Variables created within processes are not shared. They get destroyed after the process has finished executing. This is why we require special variables like pool or queues from the multiprocessing library.

How many processes to spawn?

  1. Simple problems, with limited impact: Just hardcode some number, don't overthink it
  2. CPU bound: Use as many as there are cpu cores:
import os

num_cores = os.cpu_count()
  1. I/O bound: You can go above the number of cores. If it is heavily I/O bound, then consider going 2, 3 times above the amount of cpu cores.