Hypothesis

2 Matching Annotations

Dec 2023
pythonspeed.com pythonspeed.com

Python’s multiprocessing performance problem

1
1. GadjiMurad 15 Dec 2023
  
  in Public
  
  When you’re writing Python, though, you want to share Python objects between processes.
  
  To enable this, when you pass Python objects between processes using Python’s multiprocessing library:
  
  On the sender side, the arguments get serialized to bytes with the pickle module.
  
  On the receiver side, the bytes are unserialized using pickle.
  
  This serialization and deserialization process involves computation, which can potentially be slow.
  
  shared_data python pickle
Visit annotations in context

Tags

pickle

python

shared_data

Annotators

GadjiMurad

URL

pythonspeed.com/articles/faster-multiprocessing-pickle/
tonybaloney.github.io tonybaloney.github.io

Running Python Parallel Applications with Sub Interpreters

1
1. GadjiMurad 07 Dec 2023
  
  in Public
  
  Inter-Worker communication
  
  Whether using sub interpreters or multiprocessing you cannot simply send existing Python objects to worker processes.
  
  Multiprocessing uses pickle by default. When you start a process or use a process pool, you can use pipes, queues and shared memory as mechanisms to sending data to/from the workers and the main process. These mechanisms revolve around pickling. Pickling is the builtin serialization library for Python that can convert most Python objects into a byte string and back into a Python object.
  
  Pickle is very flexible. You can serialize a lot of different types of Python objects (but not all) and Python objects can even define a method for how they can be serialized. It also handles nested objects and properties. However, with that flexibility comes a performance hit. Pickle is slow. So if you have a worker model that relies upon continuous inter-worker communication of complex pickled data you’ll likely see a bottleneck.
  
  Sub interpreters can accept pickled data. They also have a second mechanism called shared data. Shared data is a high-speed shared memory space that interpreters can write to and share data with other interpreters. It supports only immutable types, those are:
  
  Strings
  
  Byte Strings
  
  Integers and Floats
  
  Boolean and None
  
  Tuples (and tuples of tuples)
  
  To share data with an interpreter, you can either set it as initialization data or you can send it through a channel.
  
  python multiprocessing subinterpreter comparing pickle shared_data
Visit annotations in context

Tags

pickle

comparing

multiprocessing

shared_data

subinterpreter

python

Annotators

GadjiMurad

URL

tonybaloney.github.io/posts/sub-interpreter-web-workers.html

Tags

Annotators

URL

Tags

Annotators

URL