Garbage Collection Of Shared Data In Multiprocessing Via Fork
Solution 1:
This depends strongly on A) your data and B) your multiprocess method.
TLDR:
spawnobjects are cloned and each is finalised in each processfork/forkserverobjects are shared and finalised in the main processSome objects respond badly to being finalised in the main process while still used in child processes.
- The docs on
argsare wrong as content ofargsis not kept alive by itself (3.7.0)
Note: Full code available as gist. All output from CPython 3.7.0 on macOS 10.13.
We start with a simple object that reports where and when it is finalised:
defprint_pid(*args, **kwargs): # Process aware print helperprint('[%s]' % os.getpid(), *args, **kwargs)
classFinalisable:
def__init__(self, name):
self.name = name
def__repr__(self):
return'<Finalisable object %s at 0x%x>' % (getattr(self, 'name', 'unknown'), id(self))
def__del__(self):
print_pid('finalising', self)
Early collection from args
To test how args works for GC, we can build a process and immediately release its argument reference:
defdrop_early():
payload = Finalisable()
child = multiprocessing.Process(target=print, args=(payload,))
print('drop')
del payload # remove sole local reference for `args` contentprint('start')
child.start()
child.join()
With spawn method, the original is collected but the child has its own copy to finalise:
### test drop_early in 15333 method: spawn
drop
start
[15333] finalising <Finalisable object early at 0x102347390>
[15336] child sees <Finalisable object early at 0x109bd8128>
[15336] finalising <Finalisable object early at 0x109bd8128>
### doneWith fork method, the original is finalised and the child receives this finalised object:
### test drop_early in 15329 method: fork
drop
start
[15329] finalising <Finalisable object early at 0x108b453c8>
[15331] child sees <Finalisable object early at 0x108b453c8>
### doneThis shows that the payload of the main process is finalised before the child process runs and completes! Bottom line, args is not a guard against early collection!
Early collection of shared objects
Python has some types meant for safe sharing between processes. We can use this as our marker as well:
def drop_early_shared():
payload = Finalisable(multiprocessing.Value('i', 65))
child = multiprocessing.Process(target=print_pid, args=('child sees', payload,))
print('drop')
del payload
print('start')
child.start()
child.join()
With the fork method, the Value is collected early but still functional:
### test drop_early_shared in 15516 method: fork
drop
start
[15516] finalising <Finalisable object <Synchronized wrapper forc_int(65)> at 0x1071a3e10>
[15519] child sees <Finalisable object <Synchronized wrapper forc_int(65)> at 0x1071a3e10>
### done
With the spawn method, the Value is collected early and entirely broken for the child:
### test drop_early_shared in 15520 method: spawn
drop
start
[15520] finalising <Finalisable object <Synchronized wrapper for c_int(65)> at 0x103a16c18>
[15524] finalising <Finalisable object unknown at 0x101aa0128>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file ordirectory### doneThis shows that finalisation behaviour depends on your object and your environment. Bottom line, do not assume that your object is well-behaved!
While it is good practice to pass data via args, this does not free the main process from handling it! Objects might respond badly to early finalisation when the main process drops references.
As CPython uses fast-acting reference counting, you will see ill effects practically immediately. However, other implementations, e.g. PyPy, may hide such side-effects for an arbitrary time.
Post a Comment for "Garbage Collection Of Shared Data In Multiprocessing Via Fork"