PyMongo 3.1 Works Around A Funny Performance Flaw In Python 2
Bernie Hackett, Anna Herlihy, Luke Lovett, and I are pleased to announce the release of PyMongo 3.1. It adds features that conform to two new cross-language driver specs: it implements the Command Monitoring Spec to help you measure performance, and it adds a GridFSBucket class to match our new GridFS Spec.
A few of our users reported that PyMongo 3 used five or ten percent of their CPU while idle, and recorded a couple hundred context switches per second. I investigated and found a slapstick performance flaw in Python 2's condition variable that was interacting badly with my concurrency design in PyMongo 3.
A Reasonable Tradeoff?
PyMongo 3 has new server discovery and monitoring logic which requires one background thread to monitor each server the driver is connected to. These monitors wake every 10 seconds or, when PyMongo is actively searching for a server, every half-second. This architecture has big performance advantages over PyMongo 2's—it's faster at discovering servers, and more performant and responsive if you have a large replica set, or if your replica set's topology changes, or if some members are down or slow to respond. (More info here.)
So, I expected PyMongo 3 to cost a bit of idle CPU, because its threads wake every 10 seconds to check the servers; this is intended to cost a tiny bit of memory and load in exchange for big wins in performance and reliability. But our users reported, and I confirmed, that the cost was much more than I'd guessed.
It is a requirement of our Server Discovery And Monitoring Spec that a sleeping monitor can be awakened early if the driver detects a server failure. My monitors implement this using the Python standard library's Condition.wait with a timeout.
Aside from infrequent wakeups to do their appointed chores, and occasional interruptions, monitors also wake frequently to check if they should terminate. The reason for this odd design is to avoid a deadlock in the garbage collector: a PyMongo client's destructor can't take a lock, so it can't signal the monitor's condition variable. (See What To Expect When You're Expiring, or PYTHON-863.) Therefore, the only way for a dying client to terminate its background threads is to set their "stopped" flags, and let the threads see the flag the next time they wake. I erred on the side of prompt cleanup and set this frequent check interval at 100ms.
I figured that checking a flag and going back to sleep 10 times a second was cheap on modern machines. I was incorrect. Where did I go wrong?
Idling Hot
Starting in Python 3.2, the builtin C implementation of lock.acquire takes a timeout, so condition variables wait simply by calling lock.acquire; they're implemented as efficiently as I expected. In Python 3 on my system, an idle PyMongo client takes only 0.15% CPU.
But in Python 2, lock.acquire has no timeout. To wait with a timeout in Python 2, a condition variable sleeps a millisecond, tries to acquire the lock, sleeps twice as long, and tries again. This exponential backoff reaches a maximum sleep time of 50ms.
The author of this algorithm, Tim Peters, commented:
Balancing act: We can't afford a pure busy loop, so we have to sleep; but if we sleep the whole timeout time, we'll be unresponsive. The scheme here sleeps very little at first, longer as time goes on, but never longer than 20 times per second.
If the whole timeout is long, this is completely reasonable. But PyMongo calls the condition variable's "wait" method in a loop with a timeout of only 100ms, so the exponential backoff is restarted 10 times a second. Each time the exponential backoff restarts, it sets its wait time back to one millisecond. Overall, the condition variable is not waking 10 times a second, but many hundreds of times.
In Python 2.7.10 on my system, one idle PyMongo client takes a couple percent CPU to monitor one MongoDB server. On a production server with many Python processes, each monitoring a large replica set of MongoDB servers, the overhead could be significant. It would leave less headroom for traffic spikes or require bigger hardware.
The Simplest Solution The Could Possibly Work
I surprised myself with how simple the solution was: I ditched the condition variable. In the new code, Monitor threads simply sleep half a second between checks; every half second they wake, look to see if they should ping the MongoDB server, or if they should terminate, then go back to sleep. The early wake-up feature is gone now, but since the Server Discovery And Monitoring Spec prohibits monitors from checking servers more often than every half-second anyway, this is no real loss.
Even better, I deleted 100 lines of Python and added only 20.
The original bug-reporter Daniel Brandt wrote "results are looking very good." Nicola Iarocci, a MongoDB Master, chimed in: "Hello just wanted to confirm that I was also witnessing huge performance issues when running the Eve test suite under Python 2. With PyMongo 3.1rc0 however, everything is back to normal. Cheers!"
Links to more info about the PyMongo 3.1 release: