Garbage Collection Of Shared Data In Multiprocessing Via Fork
Solution 1:
This depends strongly on A) your data and B) your multiprocess method.
TLDR:
spawn
objects are cloned and each is finalised in each processfork
/forkserver
objects are shared and finalised in the main processSome objects respond badly to being finalised in the main process while still used in child processes.
- The docs on
args
are wrong as content ofargs
is not kept alive by itself (3.7.0)
Note: Full code available as gist. All output from CPython 3.7.0 on macOS 10.13.
We start with a simple object that reports where and when it is finalised:
defprint_pid(*args, **kwargs): # Process aware print helperprint('[%s]' % os.getpid(), *args, **kwargs)
classFinalisable:
def__init__(self, name):
self.name = name
def__repr__(self):
return'<Finalisable object %s at 0x%x>' % (getattr(self, 'name', 'unknown'), id(self))
def__del__(self):
print_pid('finalising', self)
Early collection from args
To test how args
works for GC, we can build a process and immediately release its argument reference:
defdrop_early():
payload = Finalisable()
child = multiprocessing.Process(target=print, args=(payload,))
print('drop')
del payload # remove sole local reference for `args` contentprint('start')
child.start()
child.join()
With spawn
method, the original is collected but the child has its own copy to finalise:
### test drop_early in 15333 method: spawn
drop
start
[15333] finalising <Finalisable object early at 0x102347390>
[15336] child sees <Finalisable object early at 0x109bd8128>
[15336] finalising <Finalisable object early at 0x109bd8128>
### done
With fork
method, the original is finalised and the child receives this finalised object:
### test drop_early in 15329 method: fork
drop
start
[15329] finalising <Finalisable object early at 0x108b453c8>
[15331] child sees <Finalisable object early at 0x108b453c8>
### done
This shows that the payload of the main process is finalised before the child process runs and completes! Bottom line, args
is not a guard against early collection!
Early collection of shared objects
Python has some types meant for safe sharing between processes. We can use this as our marker as well:
def drop_early_shared():
payload = Finalisable(multiprocessing.Value('i', 65))
child = multiprocessing.Process(target=print_pid, args=('child sees', payload,))
print('drop')
del payload
print('start')
child.start()
child.join()
With the fork
method, the Value
is collected early but still functional:
### test drop_early_shared in 15516 method: fork
drop
start
[15516] finalising <Finalisable object <Synchronized wrapper forc_int(65)> at 0x1071a3e10>
[15519] child sees <Finalisable object <Synchronized wrapper forc_int(65)> at 0x1071a3e10>
### done
With the spawn
method, the Value
is collected early and entirely broken for the child:
### test drop_early_shared in 15520 method: spawn
drop
start
[15520] finalising <Finalisable object <Synchronized wrapper for c_int(65)> at 0x103a16c18>
[15524] finalising <Finalisable object unknown at 0x101aa0128>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file ordirectory### done
This shows that finalisation behaviour depends on your object and your environment. Bottom line, do not assume that your object is well-behaved!
While it is good practice to pass data via args
, this does not free the main process from handling it! Objects might respond badly to early finalisation when the main process drops references.
As CPython uses fast-acting reference counting, you will see ill effects practically immediately. However, other implementations, e.g. PyPy, may hide such side-effects for an arbitrary time.
Post a Comment for "Garbage Collection Of Shared Data In Multiprocessing Via Fork"