Skip to content Skip to sidebar Skip to footer

Pymongo’s Bulk Write Operation Features With Multiprocessing And Generators

PyMongo supports generators for batch processing with sDB.insert(iter_something(converted)). Bulk write operation features which executes write operations in batches in order to re

Solution 1:

In this case you are not taking advantage of batch inserts. Each call to "self.sDB.insert(doc)" immediately sends the document to MongoDB and waits for the reply from the server. You could try this:

defrun(self):
    defgen():
        whileTrue:
            doc = self.task_queue.get()
            if doc isNone:  # None means shutdown
                self.task_queue.task_done()
                breakelse:
                yield doc

    try:
        self.sDB.insert(gen())
    except InvalidOperation as e:
        # Perhaps "Empty bulk write", this process received no documents.print(e)

Use mongosniff to verify that you're sending large batches to the server instead of inserting one document at a time. Depending on the number of documents and the number of processes, some processes might get no documents. PyMongo throws InvalidOperation if you try to insert from an empty iterator, so I "insert" with a "try / except".

By the way, you don't need to call createCollection with MongoDB: the first insert into a collection creates it automatically. createCollection is only necessary if you want special options, like a capped collection.

Post a Comment for "Pymongo’s Bulk Write Operation Features With Multiprocessing And Generators"