flashcards, multiprocessing in python
The following is a strong simplification. Multiprocessing is complex and frankly boring.
Multithreading vs multiprocessing
Multiprocessing is the ability of a system to run multiple processors in parallel, where each processor can run one or more thread. Multi-threading refers to the ability of a processor to execute multiple threads concurrently. Python multithreading does not work well for CPU bound tasks, but well for I/O tasks due to the way it is implemented.
Embarrassingly parallel functions
If your tasks do not need to communicate between each other (embarrassingly parallel), it would be easiest to use Pool.
{python}from multiprocessing import Pool
Implementation
Parallelised code:
{python}if __name__ == "__main__":
statement. If we didn't do that, each child process would load the file again, and infinitely spawn processes. Yes multiprocessing is weirdfrom multiprocessing import Pool
# Function to compute the square of a number
def compute_square(num):
return num * num
# List of numbers to compute squares for
numbers = [1, 2, 3, 4, 5]
def run(numbers):
# Create a Pool object with 4 processes
with Pool(processes=4) as pool:
# Map the function to the list of numbers
results = pool.map(compute_square, numbers)
print("Input Numbers: ", numbers)
print("Squared Numbers: ", results)
if __name__ == "__main__":
run(numbers)
Most of the cases you then only need to combine the results somehow.
Variables created within processes are not shared. They get destroyed after the process has finished executing. This is why we require special variables like pool or queues from the multiprocessing library.
How many processes to spawn?
- Simple problems, with limited impact: Just hardcode some number, don't overthink it
- CPU bound: Use as many as there are cpu cores:
import os
num_cores = os.cpu_count()
- I/O bound: You can go above the number of cores. If it is heavily I/O bound, then consider going 2, 3 times above the amount of cpu cores.