- Dec 2023
-
pythonspeed.com pythonspeed.com
-
When you’re writing Python, though, you want to share Python objects between processes.
To enable this, when you pass Python objects between processes using Python’s multiprocessing library:
- On the sender side, the arguments get serialized to bytes with the pickle module.
- On the receiver side, the bytes are unserialized using
pickle
.
This serialization and deserialization process involves computation, which can potentially be slow.
-
-
tonybaloney.github.io tonybaloney.github.io
-
Inter-Worker communication
Whether using sub interpreters or multiprocessing you cannot simply send existing Python objects to worker processes.
Multiprocessing uses
pickle
by default. When you start a process or use a process pool, you can use pipes, queues and shared memory as mechanisms to sending data to/from the workers and the main process. These mechanisms revolve around pickling. Pickling is the builtin serialization library for Python that can convert most Python objects into a byte string and back into a Python object.Pickle is very flexible. You can serialize a lot of different types of Python objects (but not all) and Python objects can even define a method for how they can be serialized. It also handles nested objects and properties. However, with that flexibility comes a performance hit. Pickle is slow. So if you have a worker model that relies upon continuous inter-worker communication of complex pickled data you’ll likely see a bottleneck.
Sub interpreters can accept pickled data. They also have a second mechanism called shared data. Shared data is a high-speed shared memory space that interpreters can write to and share data with other interpreters. It supports only immutable types, those are:
- Strings
- Byte Strings
- Integers and Floats
- Boolean and None
- Tuples (and tuples of tuples)
To share data with an interpreter, you can either set it as initialization data or you can send it through a channel.
-