Leaf

Yesterday afternoon Bernie Hackett and I shipped a release candidate for PyMongo 2.7, with substantial contributions from Amalia Hawkins and Kyle Erf. This version supports new features in the upcoming MongoDB 2.6, and includes major internal improvements in the driver code. We rarely make RCs before releases, but given the scope of changes it seems wise.

Install the RC like:

pip install \
  https://github.com/mongodb/mongo-python-driver/archive/2.7rc0.tar.gz

Please tell us if you find bugs.

MongoDB 2.6 support

For the first time in years, the MongoDB wire protocol is changing. Bernie Hackett updated PyMongo to support the new protocol, while maintaining backwards compatibility with old servers. He also added support for MongoDB's new parallelCollectionScan command, which scans a whole collection with multiple cursors in parallel.

Amalia Hawkins wrote a feature for setting a server-side timeout for long-running operations with the max_time_ms method:

try:
    for doc in collection.find().max_time_ms(1000):
        pass
except ExecutionTimeout:
    print "Aborted after one second."

She also added support for the new aggregation operator, $out, which creates a collection directly from an aggregation pipeline. While she was at it, she made PyMongo log a warning whenever your read preference is "secondary" but a command has to run on the primary:

>>> client = MongoReplicaSetClient(
...     'localhost',
...     replicaSet='repl0',
...     read_preference=ReadPreference.SECONDARY)
>>> client.db.command({'reIndex': 'collection'})
UserWarning: reindex does not support SECONDARY read preference
and will be routed to the primary instead.
{'ok': 1}

Bulk write API

Bernie added a bulk write API. It's now possible to specify a series of inserts, updates, upserts, replaces, and removes, then execute them all at once:

bulk = db.collection.initialize_ordered_bulk_op()
bulk.insert({'_id': 1})
bulk.insert({'_id': 2})
bulk.find({'_id': 1}).update({'$set': {'foo': 'bar'}})
bulk.find({'_id': 3}).remove()
result = bulk.execute()

PyMongo collects the operations into a minimal set of messages to the server. Compared to the old style, bulk operations have lower network costs. You can use PyMongo's bulk API with any version of MongoDB, but you only get the network advantage when talking to MongoDB 2.6.

Improved C code

After great effort, I understand why our C extensions didn't like running in mod_wsgi. I wrote an explanation that's more detailed than you want to read. But even better, Bernie fixed our C code so mod_wsgi no longer slows it down or makes it log weird warnings. Finally, I put clear configuration instructions in the PyMongo docs.

Bernie fixed all remaining platform-specific C code. Now you can run PyMongo with its C extensions on ARM, for example if you talk to MongoDB from a Raspberry Pi.

Thundering herd

I overhauled MongoClient so its concurrency control is closer to what I did for MongoReplicaSetClient in the last release. With the new MongoClient, a heavily multithreaded Python application will be much more robust in the face of network hiccups or downed MongoDB servers. You can read details in the bug report.

GridFS cursor

We had several feature requests for querying GridFS with PyMongo, so Kyle Erf implemented a GridFS cursor:

>>> fs = gridfs.GridFS(client.db)
>>> # Find large files:
...
>>> fs.find({'length': {'$gt': 1024}}).count()
42
>>> # Find files whose names start with "Kyle":
...
>>> pattern = bson.Regex('kyle.*', 'i')
>>> cursor = fs.find({'filename': pattern})
>>> for grid_out_file in cursor:
...     print grid_out_file.filename
...
Kyle
Kyle1
Kyle Erf

You can browse all 53 new features and fixes in our tracker.

Enjoy!