A. Jesse Jiryu Davis

Joel Forrester Quintet With Christina Clare

November 28th, 2014. The release party for "In New York", an album of jazz songs. My long-time friend Christina Clare has joined the Joel Forrester Quintet to sing new songs with a retro vibe and edgy modern lyrics, it's excellent stuff. [...]

November 28th, 2014. The release party for "In New York", an album of jazz songs. My long-time friend Christina Clare has joined the Joel Forrester Quintet to sing new songs with a retro vibe and edgy modern lyrics, it's excellent stuff. See them perform here and here. The album goes on sale in January.


Joel Forrester Quintet with Christina Clare

It Seemed Like A Good Idea At The Time: PyMongo's "start_request"

The road to hell is paved with good intentions. I'll tell you the story of four regrettable decisions we made when we designed PyMongo, the standard Python driver for MongoDB. Each of these decisions led to years of pain for PyMongo's [...]

Road

The road to hell is paved with good intentions.

I'll tell you the story of four regrettable decisions we made when we designed PyMongo, the standard Python driver for MongoDB. Each of these decisions led to years of pain for PyMongo's maintainers, Bernie Hackett and me, and years of confusion for our users. This winter I'm ripping out these regrettable designs in preparation for PyMongo 3.0. As I delete them, I give each a bitter little eulogy.

Today I'll tell the story of the first regrettable decision: "requests".


The Beginning

It all began when MongoDB, Inc. was a tiny startup called 10gen. Back in the beginning, Eliot Horowitz and Dwight Merriman were making a hosted application platform, a bit like Google App Engine, but with Javascript as the programming language and a JSON-like database for storage. Customers wouldn't use the database directly. It would be exposed through a clean API.

Under the hood, it had a funny way of reporting errors. First you told the database to modify some data, then you asked it whether the modification had succeeded or not. In the Javascript shell, this looked something like:

> db.collection.insert({_id: 1})
> db.runCommand({getlasterror: 1})  // It worked.
{
    "ok" : 1,
    "err" : null
}
> db.collection.insert({_id: 1})
> db.runCommand({getlasterror: 1})
{
    "ok" : 1,
    "err" : "E11000 duplicate key error"
}

The raw protocol was neatly packaged behind an API that handled error reporting for you. (Eliot describes the history of the protocol in more detail here.)

As 10gen grew, we realized the application platform wasn't going to take off. The real product was the database layer, MongoDB. 10gen decided to toss the application platform and focus on the database. We started writing drivers in several languages, including Python. That was the birth of PyMongo. Mike Dirolf began writing it in January of 2009.

At the time we thought our database's funky protocol was a feature: if you wanted minimum-latency writes, you could write to the database blind, without stopping to ask about errors. In Python, this looked like:

>>> # Obsolete code, don't use this!
>>> from pymongo import Connection
>>> c = Connection()
>>> collection = c.db.collection
>>> collection.insert({'_id': 1})
>>> collection.insert({'_id': 1})

Unacknowledged writes didn't care about network latency, so they could saturate the network's throughput:

Unacknowledged write

On the other hand, if you wanted acknowledged writes, you could ask after each operation whether it succeeded:

>>> # Also obsolete code. "safe" means "acknowledged".
>>> collection.insert({'_id': 1}, safe=True)
>>> collection.insert({'_id': 1}, safe=True)

But you'd pay for the latency:

Get last error

We thought this design was great! You, the user, get to choose whether to await acknowledgment, or "fire and forget." We made our first regrettable decision: we set the default to "fire and forget."

The Invention of start_request

There are a number of problems with the default, unacknowledged setting. The obvious one is, you don't know whether your writes succeeded. But there's a subtler problem, a problem with consistency. After an unacknowledged write, you can't always immediately read what you wrote. Say you had two Python threads executing two functions, doing unacknowledged writes:

c = Connection()
collection_one = c.db.collection_one
collection_two = c.db.collection_two

def function_one():
    for i in range(100):
        collection_one.insert({'fieldname': i})

    print collection_one.count()

def function_two():
    for i in range(100):
        collection_two.insert({'fieldname': i})

    print collection_two.count()

threading.Thread(target=function_one).start()
threading.Thread(target=function_two).start()

Since there are two threads doing concurrent operations, PyMongo opens two sockets. Sometimes, one thread finishes sending documents on a socket, checks the socket into the connection pool, and checks the other socket out of the pool to execute the "count". If that happens, the server might not finish reading the final inserts from the first socket before it responds to the "count" request on the other socket. Thus the count is less than 100:

Unacknowledged inserts

If the driver did acknowledged writes by default, it would await the server's acknowledgment of the inserts before it ran the "count", so there's no consistency problem.

But the default was unacknowledged, so users would get results that surprised them. In January of 2009, PyMongo's original author Mike Dirolf fixed this problem. He wrote a connection pool that simply allocated a socket per thread. As long as a thread always uses the same socket, it doesn't matter if its writes are acknowledged or not:

Unacknowledged inserts single socket

The server doesn't read the "count" request from the socket until it's processed all the inserts, so the count is always correct. (Assuming the inserts succeeded.) Problem solved!

Whenever a new thread started talking to MongoDB, PyMongo opened a new socket for it. When the thread died, its socket was closed. Mike's solution was simple and did what users expected. And thus began PyMongo's five-year trudge down the road to hell.


I don't want you to misunderstand me: What Mike did seemed like a good idea at the time. The company had decided that unacknowledged was the default setting for all MongoDB drivers, but Mike still wanted to guarantee read-your-writes consistency if possible. Plus, the Java driver already associated sockets with threads, so Mike wanted the Python driver to act similarly.

I can picture Mike sitting at one of the desks in 10gen's original office. There were only a half-dozen people working for 10gen then, or fewer. This was long before my time. They had a corner of an office shared by Gilt, ShopWiki and Panther Express, in an old gray stone building on 20th Street in Manhattan, next to a library. It would've been very cold that day, maybe snowy. I see Mike sitting next to Eliot, Dwight, and their tiny company. He was banging out a Python driver for MongoDB, making one quick decision after another. Did he know he was setting a course that could not be corrected for five years? Probably not.


So Mike had decided that PyMongo would reserve a socket for each thread. But what if a thread talks to MongoDB, and then goes and does something else for a long time? PyMongo reserves a socket for the thread, that no one else can use. So in February, Mike added the "end_request" method to let a thread relinquish its socket. He also added an "auto_start_request" option. It was turned on by default, but you could turn it off if you didn't need it. If you only did acknowledged writes, or if you didn't immediately read your own writes, you could turn off "auto_start_request" and you'd have a more efficient connection pool.

The next year, in January 2010, Mike simplified the pool. In his new code, "auto_start_request" could no longer be turned off. His commit message claimed he made PyMongo "~2x faster for simple benchmarks." He wrote,

Calling Connection.end_request allows the socket to be returned to the pool, and to be used by other threads instead of creating a new socket. Judicious use of this method is important for applications with many threads or with long running threads that make few calls to PyMongo.

Bernie Hackett took over PyMongo the year after that, and since "auto_start_request" didn't do anything any more, Bernie removed it entirely in April 2011.

The "judicious use of end_request" tip had been in PyMongo's documentation since the year before, but Bernie suspected that users didn't follow the directions. Just as most people don't recycle their plastic bottles, most developers didn't call "end_request", so good sockets were wasted. Even worse, since threads kept their sockets open and reserved for as long as each thread lived, it was common to see a Python application deployment with thousands and thousands of open connections to MongoDB, even though only a few connections were doing any work.

Therefore, when I came to work for Bernie that November, he directed me to improve PyMongo's connection pool in two ways. First, PyMongo should once again allow you to turn off "auto_start_request". Second, if a thread died without calling "end_request", PyMongo should somehow detect that the thread had died and reclaim its socket for the pool, instead of closing it.

Making "auto_start_request" optional again was easy. If you turned it off, each thread just checked its socket back into the pool whenever it wasn't using it. When the thread next needed a socket, it checked one out, probably a different one. We recommended that PyMongo users do "safe" writes (that is, acknowledged writes), and turn off "auto_start_request". This led to better error reporting and much more efficient connection pooling: sane choices but not, alas, the defaults. We couldn't change the defaults because we had to be backwards-compatible with the regrettable decisions made years earlier.

So restoring "auto_start_request" was a cinch. Detecting that a thread had died, however, was hell.

The Road to Hell

I wanted to fix PyMongo so it could "reclaim" sockets. If a thread had a socket reserved for it, and it forgot to call "end_request" before it died, PyMongo shouldn't just close the socket. It should check the socket back into the connection pool for some future thread to use. My first solution was to wrap each socket in an object with a __del__ method:

class SocketInfo(object):
    def __init__(self, pool, sock):
        self.pool = pool
        self.sock = sock

    def __del__(self):
        self.pool.return_socket(self.sock)

Piece of cake. We released this code in May 2012, and it was much more efficient. Whereas the previous version of PyMongo's pool tended to close and open sockets frequently:

PyMongo 2.1

PyMongo 2.2 reclaimed dead threads' sockets for new threads that wanted them:

PyMongo 2.2

I was proud of my achievement. Then all hell broke loose.

The worst bug

Right after we released my "socket reclamation" code in PyMongo, a user reported that in Python 2.6 and mod_wsgi 2.8, and with "auto_start_request" turned on (the default), his application leaked a connection once every two requests! Once he'd leaked a few thousand connections he ran out of file descriptors and crashed. It took me 18 days of desperate debugging, with Dan Crosta by my side, before I got to the bottom of it. It turns out there are roughly three bugs in Python's threadlocal implementation, which were all fixed when Antoine Pitrou rewrote threadlocals for Python 2.7.1. One of them was reported and the other two never were.

The unreported bug I'd found was in the C function in the Python interpreter that manages threadlocals. By accessing a threadlocal from a __del__ method, I'd caused the function to be called recursively, which it wasn't designed for. This caused a refleak every second time it happened, leaving open sockets that could never be garbage-collected.

This bug in an obsolete version of Python was, in turn, interacting with an obsolete version of mod_wsgi, which cleared each Python thread's state after each HTTP request. So anyone on Python 2.7 or mod_wsgi 3.x, or both, wouldn't hit the bug. But ancient versions of Python and mod_wsgi are widely used.

I wrote up my diagnosis of the bug, reimplemented my socket reclamation code to avoid the recursive call, and released the fix. I wrote a frustrated article about how weird Python's threadlocals are, and early the next year I wrote a description of my workaround.

To this day, the bug is my worst. It's among the worst for impact, certainly it was the hardest to diagnose, and it remains the most complicated to explain.

That last point—the bug is hard to explain—has real costs. It makes it very hard for anyone but me to maintain PyMongo's connection pool. Anyone else who touches it risks recreating the bug. Of course, we test for the bug after every commit: we loadtest PyMongo with Apache and mod_wsgi in our Jenkins server to guard against a regression of this bug. But no outside contributor is likely to go to such effort, nor to understand why it is necessary.

A bug factory

A full year later, in April 2013, I discovered another connection leak. Unlike the 2012 bug, this leak was rare and hard to reproduce. I don't think anyone was hurt by it. I was a much better diagnostician by now, and I knew the relevant part of CPython all too well. It took me less than a day to determine that in Python 2.6, assigning to a threadlocal is not thread safe. I added a lock around the assignment and released yet another bugfix for "start_request" in PyMongo.

For my whole career at MongoDB, I've regularly found and fixed bugs in "start_request". In 2012 I found that if one thread calls "start_request", other threads can sometimes think, wrongly, that they're in a request, too. And when a replica set primary steps down, threads in requests all threw an exception before reconnecting.

In 2013 a contributor Justin Patrin tried to add a feature to our connection pool, but what should have been a straightforward patch got fouled by the barbed wire in "start_request". In his code, if a thread in a request got a network error it leaked a semaphore. And just last month I had to fix a little bug in the connection pool related to "start_request" and mod_wsgi.

An attractive nuisance

There's another thing about "start_request" that's almost as bad as its complexity: its name. It's an attractive nuisance. It sounds like, "I have to call this before I start a request to MongoDB." I frequently see developers who are new to PyMongo write code like this:

# Don't do this.
c.start_request()
doc = c.db.collection.find_one()
c.end_request()

This is completely pointless, a waste of the programmer's effort and the machine's. But the name is so vague, and the explanation is so complex, you'd be forgiven for thinking this is how you're supposed to use PyMongo.

Now, I ask you, which decision was the most regrettable? Was socket reclamation a bad feature—should we have let PyMongo continue closing threads' sockets when threads died, instead of building a Rube Goldberg device to check those sockets back into the pool? Or maybe a worse idea came years before, when Mike turned on "auto_start_request" by default—maybe everything would have been okay if he'd required users to call "start_request" explicitly, instead. Maybe he shouldn't have implemented "start_request" at all. Most likely, the root cause was the decision we made before Mike even started writing PyMongo: the decision to make unacknowledged writes the default.

Redemption

MongoClient

Late in 2012, while I was in the midst of all these "start_request" bugs, Eliot had an idea that turned us around, and showed us the way back from hell. He figured out a way redeem our original sin, the sin of making unacknowledged writes the default. See, we had long recommended that users override PyMongo's defaults, like so:

>>> # Obsolete.
>>> c = Connection(safe=True, auto_start_request=False)

...but we couldn't make this the new default because it would break backwards compatibility. Eliot decided that all the drivers should introduce a new class with the proper defaults. Scott Hernandez came up with a good name for the class, one that no driver used yet: "MongoClient".

>>> # Modern code.
>>> c = MongoClient()

While we were at it, we deprecated the old "safe / unsafe" terms and introduced a new terminology, "write concern". Users could opt into the new class, but we wouldn't break any existing code. Orpheus took the first step of his walk home from Hades.

Write commands

In MongoDB 2.6, released this spring, we began to undo an even older decision: the old protocol that sends a modification to MongoDB, then calls "getLastError" to find out if it succeeded. The new protocol, write commands, always awaits a response from the server. Furthermore, it lets us batch hundreds of modifications in a single command, and get a batch of responses back. The change was transparent to users, but we transcended the tradeoff of our original protocol. You no longer have to decide if you want low-latency unacknowledged writes, or acknowledged writes and pay for the latency. Now you can batch up your operations, do acknowledged writes, and get the best of both worlds.

Sharding

The final nail was a change in MongoDB's sharding. It used to be that, as long as a thread used the same connection to mongos for secondary reads, mongos would keep using the same secondary on each shard's replica set. This was meant to prevent "time travel": if one secondary in a shard is lagging and another is not, we didn't want your client thread to read once from the caught-up secondary, and then once from the laggy secondary, getting an earlier view of your data.

But this design made mongos's connection pooling much less efficient. And we couldn't guarantee perfect monotonicity when you read from a secondary anyway. In MongoDB 2.6 we changed this behavior so that mongos balances each client connection's reads among all the secondaries. Thus, the last good reason for a client thread to always use the same connection is obsolete. It's time for "start_request" to go.

Removing start_request

This morning I removed "start_request" from PyMongo's code, on the branch that will become PyMongo 3.0. The change deletes about 300 lines. The hairiest, riskiest Python I've ever written is gone. The connection pool code looks sane again. Once again, a contributor could send patches for it without opening a can of worms. Coders who start out with PyMongo won't be lured by the attractive nuisance of "start_request". And my time won't be taken up by occasional, urgent bugs in the PyMongo connection pool. Destroying my own work has never before been so satisfying, so liberating.

Post-mortem

The onramps of the road to hell are not well-marked. How can we recognize them next time?

One principle is: Don't try to give users what they can't have. You can't combine read-your-writes consistency with unacknowledged writes. Our efforts to give you both things at once were heroic, but foolish. We thought we were being generous to you by maintaining very complex code, but as the Zen of Python says,

Simple is better than complex.
If the implementation is hard to explain, it's a bad idea.

It's fun to write complex, hard-to-explain code. It's certainly more fun to write gnarly code now, than to think hard about the future, and wait until you've thought of a simple design that will stand the test of time. But in the case of "start_request", a better design was out there.

Here again, the Zen of Python is instructive. It advises us to wait until we have a pretty good answer, before we start coding:

Now is better than never.
Although never is often better than right now.

But even though we made a regrettable decision, we eventually righted ourselves. The new protocol—write commands—gives us high throughput and acknowledged writes, without breaking backwards compatibility. And now that we have the new protocol we can remove "start_request" in PyMongo 3.0. The walk home from hell is over.


The next installment in "It Seemed Like A Good Idea At The Time" is PyMongo's "use_greenlets".

Announcing PyMongo 2.8 Release Candidate

By Jebulon, via Wikimedia Commons We've just tagged a release candidate of PyMongo, the standard MongoDB driver for Python. You can install it like: pip install git+git://github.com/mongodb/mongo-python-driver.git@2.8rc0 Most [...]

Morelia spilota variegata

By Jebulon, via Wikimedia Commons

We've just tagged a release candidate of PyMongo, the standard MongoDB driver for Python. You can install it like:

pip install git+git://github.com/mongodb/mongo-python-driver.git@2.8rc0

Most of the changes between PyMongo 2.8 and the previous release, 2.7.2, are for compatibility with the upcoming MongoDB 2.8 release. (By coincidence, PyMongo and MongoDB are at the same version number right now.)


Compatibility

SCRAM-SHA-1 authentication

MongoDB 2.8 adds support for SCRAM-SHA-1 authentication and makes it the new default, replacing our inferior old protocol MONGODB-CR ("MongoDB Challenge-Response"). PyMongo's maintainer Bernie Hackett added support for the new protocol. PyMongo and MongoDB work together to make this change seamless: you can upgrade PyMongo first, then your MongoDB servers, and authentication will keep working with your existing passwords. When you choose to, you can upgrade how your passwords are hashed within the database itself—we'll document how to do that when we release MongoDB 2.8.

SCRAM-SHA-1 is more secure than MONGODB-CR, but it's also slower: the new protocol requires the client to do 10,000 iterations of SHA-1 by default, instead of one iteration of MD5. This has two implications for you.

First, you must create one MongoClient or MongoReplicaSetClient instance when your application starts up, and keep using it for your application's lifetime. For example, consider this little Flask app:

from pymongo import MongoClient
from flask import Flask

# This is the right thing to do:
db = MongoClient('mongodb://user:password@host').test
app = Flask(__name__)

@app.route('/')
def home():
    doc = db.collection.find_one()
    return repr(doc)

app.run()

That's the right way to build your app, because it lets PyMongo reuse connections to MongoDB and maintain a connection pool.

But time and again and I see people write request handlers like this:

@app.route('/')
def home():
    # Wrong!!
    db = MongoClient('mongodb://user:password@host').test
    doc = db.collection.find_one()
    return repr(doc)

When you create a new MongoClient for each request like this, it requires PyMongo to set up a new TCP connection to MongoDB for every request to your application, and then shut it down after each request. This already hurts your performance.

But if you're using authentication and you upgrade to PyMongo 2.8 and MongoDB 2.8, you'll also pay for SHA-1 hashing with every request. So if you aren't yet following my recommendation and reusing one client throughout your application, fix your code now.

Second, you should install backports.pbkdf2—it speeds up the hash computation, especially on Python older than 2.7.8, or on Python 3 before Python 3.4.

I've updated PyMongo's copy_database so you can use SCRAM-SHA-1 authentication to copy between servers. More information about SCRAM-SHA-1 is in PyMongo's latest auth documentation.

count with hint

Starting in MongoDB 2.6 the "count" command can take a hint that tells it which index to use, by name. In PyMongo 2.8 Bernie added support for count with hint:

from pymongo import ASCENDING

collection.create_index([('field', ASCENDING)], name='my_index')

collection.find({
    'field': {'$gt': 10}
}).hint('my_index').count()

This will work with MongoDB 2.6, and in MongoDB 2.8 count support hints by index specs, not just index names:

collection.find({
    'field': {'$gt': 10}
}).hint([('field', ASCENDING)]).count()

PyMongo improvements

SON performance

Don Mitchell from EdX generously offered us a patch that improves the performance of SON, PyMongo's implementation of an ordered dict. His patch avoids unnecessary copies of field names in many of SON's methods.

socketKeepAlive

In some network setups, users need to set the SO_KEEPALIVE flag on PyMongo's TCP connections to MongoDB, so Bernie added a socketKeepAlive option to MongoClient and MongoReplicaSetClient.

Deprecation warnings

Soon we'll release a PyMongo 3.0 that removes many obsolete features from PyMongo and gives you a cleaner, safer, faster new API. But we want to make the upgrade as smooth as possible for you. To begin with, I documented our compatibility policy. I explained how to test your code to make sure you use no deprecated features of PyMongo.

Second, I deprecated some features that will be removed in PyMongo 3.0:

start_request is deprecated and will be removed in PyMongo 3.0, because it's not the right way to ensure consistency, and it doesn't work with sharding in MongoDB 2.8. Further justifications can be found here.

MasterSlaveConnection is deprecated and will be removed, since master-slave setups are themselves obsolete. Replica sets are superior to master-slave, especially now that replica sets can have more than 12 members. Anyway, even if you still have a master-slave setup, PyMongo's MasterSlaveConnection wasn't very useful.

And finally, copy_database is deprecated. We asked customers if they used it and the answer was no, people use the mongo shell for copying databases, not PyMongo. For the sake of backwards compatibility I upgraded PyMongo's copy_database to support SCRAM-SHA-1, anyway, but in PyMongo 3.0 we plan to remove it. Let me know in the comments if you think this is the wrong decision.

Bugs

The only notable bugfix in PyMongo 2.8 is the delightfully silly mod_wsgi error I wrote about last month. But if you find any new bugs, please let us know by opening an issue in Jira, I promise we'll handle it promptly.

Zen Chaplain Ordination

July 5, 2014th. Ann Geido Grossman was ordained as a chaplain by abbot Enkyo Roshi, at the Village Zendo in New York. My zendo was born in the middle of the AIDS epidemic in Greenwich Village in the 80s. We have a tradition of hospice [...]

July 5, 2014th. Ann Geido Grossman was ordained as a chaplain by abbot Enkyo Roshi, at the Village Zendo in New York.

My zendo was born in the middle of the AIDS epidemic in Greenwich Village in the 80s. We have a tradition of hospice chaplaincy. Geido is continuing this tradition, working at a hospice in Trenton.

Geido in gassho

Enkyo anoints Geido's forehead

Geido thanks the sangha

Enkyo Roshi and Geido

Motor 0.3.4 Released

Today I released version 0.3.4 of Motor, the asynchronous MongoDB driver for Python and Tornado. This release is compatible with MongoDB 2.2, 2.4, and 2.6. It requires PyMongo 2.7.1. This release fixes a leak in the connection pool. [...]

Motor

Today I released version 0.3.4 of Motor, the asynchronous MongoDB driver for Python and Tornado. This release is compatible with MongoDB 2.2, 2.4, and 2.6. It requires PyMongo 2.7.1.

This release fixes a leak in the connection pool. MotorPool.get_socket() proactively checks a socket for errors if it hasn't been used in more than a second. It calls select() on the socket's file descriptor to see if the socket has been shut down at the OS level. If this check fails, Motor discards the socket. But it forgot to decrement its socket counter, so the closed socket is forever counted against max_pool_size. This is the equivalent of a semaphore leak in a normal multi-threaded connection pool.

The bug has been present since Motor 0.2. I discovered it while testing Motor's handling of network errors with exhaust cursors, but the leak is not particular to exhaust cursors.

Get the latest version with pip install --upgrade motor. The documentation is on ReadTheDocs. View the changelog here. If you encounter any issues, please file them in Jira.

Toro 0.7 Released

I've just released version 0.7 of Toro. Toro provides semaphores, locks, events, conditions, and queues for Tornado coroutines. It enables advanced coordination among coroutines, similar to what you do in a multithreaded [...]

Toro

I've just released version 0.7 of Toro. Toro provides semaphores, locks, events, conditions, and queues for Tornado coroutines. It enables advanced coordination among coroutines, similar to what you do in a multithreaded application. Get the latest version with "pip install --upgrade toro". Toro's documentation, with plenty of examples, is on ReadTheDocs.

There is one bugfix in this release. Semaphore.wait() is supposed to wait until the semaphore can be acquired again:

@gen.coroutine
def coro():
    sem = toro.Semaphore(1)
    assert not sem.locked()

    # A semaphore with initial value of 1 can be acquired once,
    # then it's locked.
    sem.acquire()
    assert sem.locked()

    # Wait for another coroutine to release the semaphore.
    yield sem.wait()

... however, there was a bug and the semaphore didn't mark itself "locked" when it was acquired, so "wait" always returned immediately. I'm grateful to "abing" on GitHub for noticing the bug and contributing a fix.

A Normal Accident In Python and mod_wsgi

I fixed a glitch in PyMongo last week, the result of a slapstick series of mishaps. It reminds me of the Three Mile Island nuclear accident, which inspired the "Normal Accidents" theory of failure in complex systems: one surprise leads to [...]

Three Mile Island nuclear power plant

I fixed a glitch in PyMongo last week, the result of a slapstick series of mishaps. It reminds me of the Three Mile Island nuclear accident, which inspired the "Normal Accidents" theory of failure in complex systems: one surprise leads to the next, to the next, to an outcome no one anticipated.

It started a couple months ago, when we got a minor bug report about PyMongo. The reporter was using PyMongo with Python 3.2, mod_wsgi, and Apache. Whenever he restarted his application, he saw this error message in his log:

Exception TypeError:
  "'NoneType' object is not callable"
  in <bound method Pool.__del__> ignored

The exception was ignored because it was raised from a "__del__" method, so it didn't affect his application. Still, I needed to understand what was going on. So I made a test environment, and I used Apache to run a Python script like the one in the bug report:

import pymongo

class C:
    pass

C.client = pymongo.MongoClient()

I could reproduce the bug: Whenever I restarted Apache, the PyMongo connection pool's destructor logged "TypeError: NoneType object is not callable."

The pool's destructor makes two method calls, and no function calls:

def __del__(self):
    # _thread_id_to_sock is a dict of sockets.
    for sock in self._thread_id_to_sock.values():
        sock.close()

During interpreter shutdown, None is somehow being called as a function. I'm no expert on Python's shutdown sequence, but I've never heard of a method being set to None. And yet, the only calls in this code are the "values" method and the "close" method. What gives?

I put a "return" statement at the beginning of "__del__" and restarted Apache: the error disappeared. So I moved the "return" statement down a line, before "sock.close()". The next time I restarted Apache, I saw the error again.

While I was hacking directly on the installed PyMongo package, I noticed something funny. The installed code looked like:

def __del__(self):
    # _thread_id_to_sock is a dict of sockets.
    for sock in list(self._thread_id_to_sock.values()):
        sock.close()

Notice the call to "list"? When I installed PyMongo with Python 3.2, the installer ran 2to3 on PyMongo's code, which automatically translates Python 2 syntax to Python 3.

Why did 2to3 decide to wrap the "values" call in "list"? Well, in Python 2, "values" returns a copy, but in Python 3 it returns a dictionary view that's tied to the dict's underlying data. 2to3 worries that I might rely on the old, copying behavior, so in Python 3 it makes a copy of the values by calling "list".

So it must be the call to "list" that raises the TypeError. Sure enough, when I deleted the "list" call from the installed PyMongo code, the exception disappears. Fantastic!

Why don't we see this error all the time, though? Perhaps it has to do with the shutdown sequence. Normally, a pool is referred to by other objects, but not by a class. I hypothesized that the reporter saw the error because he'd made a reference from a class to the MongoClient to the pool, which delayed the pool's destruction until after the "list" builtin had been set to None:

Class refers to pool

To test this theory, I replaced this line:

C.client = pymongo.MongoClient()

...with this:

client = pymongo.MongoClient()

Now the pool is no longer referred to by a class, it's only referred to by a global variable in the module named "mod":

Variable refers to pool

Sure enough the error disappeared.

So far, I understood that the connection pool's destructor ran too late, because it was being kept alive by a reference from a class, and it relied on the "list" builtin, because 2to3 had added a call to "list", so it raised a TypeError. Now, did it only happen with mod_wsgi? I wrote the simplest Python example I could, and I tried to reproduce the TypeError:

# mod.py
class C(object):
    pass

class Pool(object):
    def __del__(self):
        print('del')
        list()

C.pool = Pool()

I could import this module into the Python shell, then quit, and I got no TypeError. Actually I didn't see it print "del" either—the pool's destructor never runs at all. Why not?

A class definition like "C" creates a reference cycle. It refers to itself as the first element in its method resolution order. You can see how "C" refers to itself by printing its method resolution order in the Python shell:

>>> import mod
>>> mod.C.__mro__
(<class 'mod.C'>, <type 'object'>)

When the interpreter shuts down it runs the C function "Py_Finalize", which first does a round of garbage collection to destroy reference cycles, then destroys all modules:

void Py_Finalize(void) {
    /* Collect garbage.  This may call finalizers; it's nice to call these
     * before all modules are destroyed.
     */
    PyGC_Collect();

    /* Destroy all modules */
    PyImport_Cleanup();
}

When "PyGC_Collect" runs, the "mod" module still refers to class C, so the class isn't destroyed and neither is the Pool it refers to:

Class reference cycle

Next, "PyImport_Cleanup" sets all modules' global variables to None. Now class C is garbage: it's in a reference cycle and nothing else refers to it:

Cyclic garbage

But the interpreter is dying and it will never call "PyGC_Collect" again, so class C is never destroyed and neither is the pool.

Great, I understand everything up to this point. But, if the pool is never destroyed when a regular Python interpreter shuts down, why is it destroyed when a mod_wsgi application restarts? I dove into mod_wsgi's source code to see how it manages Python interpreters. (This isn't my first rodeo: I examined mod_wsgi closely for my "Python C Extensions And mod_wsgi" article last year.) I wrote a little C program that runs Python in a sub interpreter, the same as mod_wsgi does:

int main()
{
    Py_Initialize();
    PyThreadState *tstate_enter = PyThreadState_Get();
    PyThreadState *tstate = Py_NewInterpreter();

    PyRun_SimpleString("import mod\n");
    if (PyErr_Occurred()) {
        PyErr_Print();
    }
    Py_EndInterpreter(tstate);
    PyThreadState_Swap(tstate_enter);
    printf("about to finalize\n");
    Py_Finalize();
    printf("done\n");

    return 0;
}

Just like mod_wsgi, my program creates a new Python sub interpreter and tells it to import my module, then it swaps out the sub interpreter and shuts it down with "Py_EndInterpreter". Its last act is "Py_Finalize". And behold! The script quoth:

about to finalize

Exception TypeError:
  "'NoneType' object is not callable"
  in <bound method Pool.__del__> ignored

done

My little C program acts just like the application in the bug report! What is it about this code that makes it throw the TypeError during shutdown, when a regular Python interpreter does not?

I stepped through my program in the debugger and solved the final mystery. What makes this code special is, it calls "Py_EndInterpreter". "Py_EndInterpreter" calls "PyImport_Cleanup", which sets all modules' global variables to None, thus turning class C into cyclic garbage:

Cyclic garbage

"PyImport_Cleanup" even clears the "builtins" module, which includes functions like "list". Any code that tries to call "list" afterward is actually calling None.

Now "Py_Finalize" calls "PyGC_Collect". (It will then run "PyImport_Cleanup" for the second time, but that's not relevant now.) This is the difference between the regular interpreter's shutdown sequence and mod_wsgi's: In the mod_wsgi case, modules have been cleared before the final garbage collection, so class C is destroyed along with the pool. However, since the pool's destructor runs after "PyImport_Cleanup", its reference to "list" is now None, and it throws "TypeError: 'NoneType' object is not callable".

Success! I had traced the cause of the bug from start to finish. To recap: in the bug-reporter's code, he had made a reference from a class to a pool, which made the pool's destructor run very late. And he ran the code in mod_wsgi, which clears modules before the final garbage collection, otherwise the pool's destructor wouldn't have run at all. He was using Python 3, so 2to3 had inserted a call to "list" in the pool's destructor, and since the destructor ran after all modules were cleared, the call to "list" failed.

Luckily, this cascade of failures leads merely to an occasional log message, not to a Three Mile Island meltdown. My boss Bernie came up with an incredibly simple fix. I replace the call to "values":

def __del__(self):
    for sock in self._thread_id_to_sock.values():
        sock.close()

... with a call to "itervalues":

def __del__(self):
    for sock in self._thread_id_to_sock.itervalues():
        sock.close()

(You can view the whole commit here.)

Now that I'm using "itervalues", 2to3 now replaces it with "values" in Python 3 instead of "list(values)". Since I'm no longer relying on the "list" builtin to be available in the destructor, no TypeError is raised.

reStructured Text With Chrome And LiveReload

I've found a useful set of tools for writing RST, when I must. I'll show you how to configure LiveReload and Chrome to make the experience of writing RST's tortured syntax somewhat bearable. (This article is an improvement over the method I [...]

I've found a useful set of tools for writing RST, when I must. I'll show you how to configure LiveReload and Chrome to make the experience of writing RST's tortured syntax somewhat bearable.

(This article is an improvement over the method I wrote about last year.)

LiveReload

I bought LiveReload from the Mac App Store for $10, and opened it. Under "Monitored Folders" I added my project's home directory: I was updating Motor's documentation so I added the "motor/doc" directory.

LiveReload

Next to "Monitoring 44 file extensions" I hit "Options" and added "rst" as a 45th.

LiveReload file extension options

Then I checked "Run custom command after processing changes" and hit "Options". In the popup dialog I added the command for building Motor's documentation. It's a typical Sphinx project, so the build command is:

/Users/emptysquare/.virtualenvs/motor/bin/sphinx-build \
  -b html -d _build/doctrees . _build/html

Note that I specified the full path to the virtualenv'ed sphinx script.

That's all there is to configuring LiveReload. Hit the green box on the lower right of its main window to see the build command's output. Now whenever you change an RST file you should see some Sphinx output scroll by:

LiveReload Sphinx output

Chrome

Next, follow LiveReload's instructions for installing the Chrome extension. Pay attention to LiveReload's tip: "If you want to use it with local files, be sure to enable 'Allow access to file URLs' checkbox in Tools > Extensions > LiveReload after installation."

Now open one of the HTML files Sphinx made, and click the LiveReload icon on your browser to enable it. The difference between "enabled" and "disabled" is damn subtle. This is disabled:

Disabled

This is enabled:

Enabled

The icon plays it close to the chest, but if you hover your mouse over it, it'll admit whether it's enabled or not.

Back at the LiveReload application, you'll now see "1 browser connected."

Try it out! Now you can make changes to your RST and see it live in your browser. I don't think I'll ever learn to type RST's syntax reliably, but at least now, I can see at once whether I've typed it right or not.

Motor 0.3.3 Released

Today I released version 0.3.3 of Motor, the asynchronous MongoDB driver for Python and Tornado. This release is compatible with MongoDB 2.2, 2.4, and 2.6. It requires PyMongo 2.7.1. This release fixes an occasional infinite loop and [...]

Motor

Today I released version 0.3.3 of Motor, the asynchronous MongoDB driver for Python and Tornado. This release is compatible with MongoDB 2.2, 2.4, and 2.6. It requires PyMongo 2.7.1.

This release fixes an occasional infinite loop and memory leak. The bug was triggered when you passed a callback to MotorCursor.each, and Motor had to open a new socket in the process of executing your callback, and your callback raised an exception:

from tornado.ioloop import IOLoop
import motor

loop = IOLoop.instance()

def each(result, error):
    raise Exception()

collection = motor.MotorClient().test.test
cursor = collection.find().each(callback=each)
loop.start()

The bug has been present since Motor 0.2. I am indebted to Eugene Protozanov for an excellent bug report.

Get the latest version with pip install --upgrade motor. The documentation is on ReadTheDocs. View the changelog here. If you encounter any issues, please file them in Jira.