Skip to content Skip to sidebar Skip to footer

Multiple Processes Write A Same Csv File, How To Avoid Conflict?

9 processes write a same CSV output simultaneously in our system. And the output speed is fast. About 10 million new rows per day. To write CSV file we use the csv module of Python

Solution 1:

There is no direct way that I know.

One common workaround is to split the responsibility between "producers" and "outputter".

Get one more process responsible for outputting the CSV from a multiprocess queue and have all the "producers" process pushes to that queue.

I'd advise looking at python's multiprocessing module and especially the part about queues . If you're stuck when trying it, raise new questions here as this can become tricky.

Alternative is to use a "giant lock" which will require each process to wait for availability of the resource (using a system mutex for example). This will make the code easier but less scalable.

Solution 2:

The only proven solution is, as Bruce explained, to have one single process accepting outputs from the "producer" processes and writing to the file. Could be a queue / messaging system, or just a plain old SQL database (from which it's easy to output csv files).

Solution 3:

As a first and easiest attempt, I would try to always flush() the output, this shall force IO to write in the file before accepting next data.

Post a Comment for "Multiple Processes Write A Same Csv File, How To Avoid Conflict?"