Skip to content Skip to sidebar Skip to footer

Garbage Collection Of Shared Data In Multiprocessing Via Fork

I am doing some multiprocessing in linux, and I am using shared memory that is currently not explicitly passed to the child processes (not via an argument). In the official python

Solution 1:

This depends strongly on A) your data and B) your multiprocess method.

TLDR:

  • spawn objects are cloned and each is finalised in each process
  • fork/forkserver objects are shared and finalised in the main process

  • Some objects respond badly to being finalised in the main process while still used in child processes.

  • The docs on args are wrong as content of args is not kept alive by itself (3.7.0)

Note: Full code available as gist. All output from CPython 3.7.0 on macOS 10.13.

We start with a simple object that reports where and when it is finalised:

defprint_pid(*args, **kwargs):  # Process aware print helperprint('[%s]' % os.getpid(), *args, **kwargs)


classFinalisable:
    def__init__(self, name):
        self.name = name

    def__repr__(self):
        return'<Finalisable object %s at 0x%x>' % (getattr(self, 'name', 'unknown'), id(self))

    def__del__(self):
        print_pid('finalising', self)

Early collection from args

To test how args works for GC, we can build a process and immediately release its argument reference:

defdrop_early():
    payload = Finalisable()
    child = multiprocessing.Process(target=print, args=(payload,))
    print('drop')
    del payload  # remove sole local reference for `args` contentprint('start')
    child.start()
    child.join()

With spawn method, the original is collected but the child has its own copy to finalise:

### test drop_early in 15333 method: spawn
drop
start
[15333] finalising <Finalisable object early at 0x102347390>
[15336] child sees <Finalisable object early at 0x109bd8128>
[15336] finalising <Finalisable object early at 0x109bd8128>
### done

With fork method, the original is finalised and the child receives this finalised object:

### test drop_early in 15329 method: fork
drop
start
[15329] finalising <Finalisable object early at 0x108b453c8>
[15331] child sees <Finalisable object early at 0x108b453c8>
### done

This shows that the payload of the main process is finalised before the child process runs and completes! Bottom line, args is not a guard against early collection!

Early collection of shared objects

Python has some types meant for safe sharing between processes. We can use this as our marker as well:

def drop_early_shared():
    payload = Finalisable(multiprocessing.Value('i', 65))
    child = multiprocessing.Process(target=print_pid, args=('child sees', payload,))
    print('drop')
    del payload
    print('start')
    child.start()
    child.join()

With the fork method, the Value is collected early but still functional:

### test drop_early_shared in 15516 method: fork
drop
start
[15516] finalising <Finalisable object <Synchronized wrapper forc_int(65)> at 0x1071a3e10>
[15519] child sees <Finalisable object <Synchronized wrapper forc_int(65)> at 0x1071a3e10>
### done

With the spawn method, the Value is collected early and entirely broken for the child:

### test drop_early_shared in 15520 method: spawn
drop
start
[15520] finalising <Finalisable object <Synchronized wrapper for c_int(65)> at 0x103a16c18>
[15524] finalising <Finalisable object unknown at 0x101aa0128>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/synchronize.py", line 111, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file ordirectory### done

This shows that finalisation behaviour depends on your object and your environment. Bottom line, do not assume that your object is well-behaved!


While it is good practice to pass data via args, this does not free the main process from handling it! Objects might respond badly to early finalisation when the main process drops references.

As CPython uses fast-acting reference counting, you will see ill effects practically immediately. However, other implementations, e.g. PyPy, may hide such side-effects for an arbitrary time.

Post a Comment for "Garbage Collection Of Shared Data In Multiprocessing Via Fork"