Morelia spilota variegata

By Jebulon, via Wikimedia Commons

We've just tagged a release candidate of PyMongo, the standard MongoDB driver for Python. You can install it like:

pip install git+git://github.com/mongodb/mongo-python-driver.git@2.8rc0

Most of the changes between PyMongo 2.8 and the previous release, 2.7.2, are for compatibility with the upcoming MongoDB 2.8 release. (By coincidence, PyMongo and MongoDB are at the same version number right now.)


Compatibility

SCRAM-SHA-1 authentication

MongoDB 2.8 adds support for SCRAM-SHA-1 authentication and makes it the new default, replacing our inferior old protocol MONGODB-CR ("MongoDB Challenge-Response"). PyMongo's maintainer Bernie Hackett added support for the new protocol. PyMongo and MongoDB work together to make this change seamless: you can upgrade PyMongo first, then your MongoDB servers, and authentication will keep working with your existing passwords. When you choose to, you can upgrade how your passwords are hashed within the database itself—we'll document how to do that when we release MongoDB 2.8.

SCRAM-SHA-1 is more secure than MONGODB-CR, but it's also slower: the new protocol requires the client to do 10,000 iterations of SHA-1 by default, instead of one iteration of MD5. This has two implications for you.

First, you must create one MongoClient or MongoReplicaSetClient instance when your application starts up, and keep using it for your application's lifetime. For example, consider this little Flask app:

from pymongo import MongoClient
from flask import Flask

# This is the right thing to do:
db = MongoClient('mongodb://user:password@host').test
app = Flask(__name__)

@app.route('/')
def home():
    doc = db.collection.find_one()
    return repr(doc)

app.run()

That's the right way to build your app, because it lets PyMongo reuse connections to MongoDB and maintain a connection pool.

But time and again and I see people write request handlers like this:

@app.route('/')
def home():
    # Wrong!!
    db = MongoClient('mongodb://user:password@host').test
    doc = db.collection.find_one()
    return repr(doc)

When you create a new MongoClient for each request like this, it requires PyMongo to set up a new TCP connection to MongoDB for every request to your application, and then shut it down after each request. This already hurts your performance.

But if you're using authentication and you upgrade to PyMongo 2.8 and MongoDB 2.8, you'll also pay for SHA-1 hashing with every request. So if you aren't yet following my recommendation and reusing one client throughout your application, fix your code now.

Second, you should install backports.pbkdf2—it speeds up the hash computation, especially on Python older than 2.7.8, or on Python 3 before Python 3.4.

I've updated PyMongo's copy_database so you can use SCRAM-SHA-1 authentication to copy between servers. More information about SCRAM-SHA-1 is in PyMongo's latest auth documentation.

count with hint

Starting in MongoDB 2.6 the "count" command can take a hint that tells it which index to use, by name. In PyMongo 2.8 Bernie added support for count with hint:

from pymongo import ASCENDING

collection.create_index([('field', ASCENDING)], name='my_index')

collection.find({
    'field': {'$gt': 10}
}).hint('my_index').count()

This will work with MongoDB 2.6, and in MongoDB 2.8 count support hints by index specs, not just index names:

collection.find({
    'field': {'$gt': 10}
}).hint([('field', ASCENDING)]).count()

PyMongo improvements

SON performance

Don Mitchell from EdX generously offered us a patch that improves the performance of SON, PyMongo's implementation of an ordered dict. His patch avoids unnecessary copies of field names in many of SON's methods.

socketKeepAlive

In some network setups, users need to set the SO_KEEPALIVE flag on PyMongo's TCP connections to MongoDB, so Bernie added a socketKeepAlive option to MongoClient and MongoReplicaSetClient.

Deprecation warnings

Soon we'll release a PyMongo 3.0 that removes many obsolete features from PyMongo and gives you a cleaner, safer, faster new API. But we want to make the upgrade as smooth as possible for you. To begin with, I documented our compatibility policy. I explained how to test your code to make sure you use no deprecated features of PyMongo.

Second, I deprecated some features that will be removed in PyMongo 3.0:

start_request is deprecated and will be removed in PyMongo 3.0, because it's not the right way to ensure consistency, and it doesn't work with sharding in MongoDB 2.8. Further justifications can be found here.

MasterSlaveConnection is deprecated and will be removed, since master-slave setups are themselves obsolete. Replica sets are superior to master-slave, especially now that replica sets can have more than 12 members. Anyway, even if you still have a master-slave setup, PyMongo's MasterSlaveConnection wasn't very useful.

And finally, copy_database is deprecated. We asked customers if they used it and the answer was no, people use the mongo shell for copying databases, not PyMongo. For the sake of backwards compatibility I upgraded PyMongo's copy_database to support SCRAM-SHA-1, anyway, but in PyMongo 3.0 we plan to remove it. Let me know in the comments if you think this is the wrong decision.

Bugs

The only notable bugfix in PyMongo 2.8 is the delightfully silly mod_wsgi error I wrote about last month. But if you find any new bugs, please let us know by opening an issue in Jira, I promise we'll handle it promptly.