A. Jesse Jiryu Davis

Talk Python To Me Podcast: Python and MongoDB

I was honored to be Michael Kennedy's guest for the second episode of his new podcast. We talked about my career as a Python programmer and how I came to work for MongoDB. We discussed PyMongo, Motor, and Monary. You should subscribe to [...]

I was honored to be Michael Kennedy's guest for the second episode of his new podcast. We talked about my career as a Python programmer and how I came to work for MongoDB. We discussed PyMongo, Motor, and Monary.

Talk Python To Me, Episode 2

You should subscribe to Michael's podcast. His conversation with Nicola Iarocci in the previous episode was great, too, and the upcoming interviews promise to be informative.

PyPy, Garbage Collection, And A Deadlock

I fixed a deadlock in PyMongo 3 and PyPy which, rarely, could happen in PyMongo 2 as well. Diagnosing the deadlock was educational and teaches us a rule about writing __del__ methods—yet another tip about what to expect when you're [...]

Ouroboros

I fixed a deadlock in PyMongo 3 and PyPy which, rarely, could happen in PyMongo 2 as well. Diagnosing the deadlock was educational and teaches us a rule about writing __del__ methods—yet another tip about what to expect when you're expiring.

A Toy Example

This deadlocks in CPython:

import threading

lock = threading.Lock()

class C(object):
    def __del__(self):
        print('getting lock')
        with lock:
            print('releasing lock')
            pass

c = C()
with lock:
    del c

The statement del c removes the variable c from the namespace. The object that c had referred to has no more references, so CPython immediately calls its __del__ method, which tries to get the lock. The lock is held, so the process deadlocks. It prints "getting lock" and hangs forever.

What if we swap the final two statements?:

del c
with lock:
    pass

This is fine. The __del__ method completes and releases the lock before the next statement acquires it.

But consider PyPy. It doesn't use reference counts: unreferenced objects live until the garbage collector frees them. The moment when objects are freed is unpredictable. If the GC happens to kick in while the lock is held, it will deadlock. We can force this situation:

del c
with lock:
    gc.collect()

Just like the first example, this prints "getting lock" and deadlocks.

The PyMongo Bug

A few weeks ago, I found a deadlock like this in my code for the upcoming PyMongo 3.0 release. From there, I discovered a far rarer deadlock in the current release as well.

I'll give you a little context so you can see how the bug arose. With PyMongo you stream results from the MongoDB server like:

for document in collection.find():
    print(document)

The find method actually returns an instance of the Cursor class, so you could write this:

cursor = collection.find()
for document in cursor:
    print(document)

As you iterate the cursor, it returns documents from its client-side buffer until the buffer is empty, then it fetches another big batch of documents from the server. After it returns the final document of the final batch, it raises StopIteration.

But what if your code throws an exception before then?

for document in cursor:
    1 / 0  # Oops.

The client-side cursor goes out of scope, but the server keeps a small amount of cursor state in memory for 10 minutes. PyMongo wants to clean this up promptly, by telling the server to close the cursor as soon as the client doesn't need it. The Cursor class's destructor is in charge of telling the server:

class Cursor(object):
    def __del__(self):
        if self.alive:
            self._mongo_client.close_cursor(self.cursor_id)

In order to send the message to the server, PyMongo 3.0 has to do some work: it gets a lock on the internal Topology class so it can retrieve the connection pool, then it locks the pool so it can check out a socket. In PyPy, we do this work at a wholly unpredictable moment: it's whenever garbage collection is triggered. If any thread is holding either lock at this moment, the process deadlocks.

(Some details: By default, objects with a __del__ method are only freed by PyPy's garbage collector during a full GC, which is triggered when memory has grown 82% since the last full GC. So if you let an open cursor go out of scope, it won't be freed for some time.)

Diagnosis

I first found this deadlock in the unreleased code for PyMongo 3.0. Our test suite was occasionally hanging under PyPy in Jenkins. When I signaled the hanging test with Control-C it printed:

Exception KeyboardInterrupt in method __del__
of <pymongo.cursor.Cursor object> ignored

The exception is "ignored" and printed to stderr, as all exceptions in __del__ are. Once it printed the error, the test suite resumed and completed. So I added two bits of debugging info. First, whenever a cursor was created it stored a stack trace so it could remember where it came from. And second, if it caught an exception in __del__, it printed the stored traceback and the current traceback:

class Cursor(object):
    def __init__(self):
        self.tb = ''.join(traceback.format_stack())

    def __del__(self):
        try:
            self._mongo_client.close_cursor(self.cursor_id)
        except:
            print('''
I came from:%s.
I caught:%s.
''' % (self.tb, ''.join(traceback.format_stack()))

The next time the test hung, I hit Control-C and it printed something like:

I came from:
Traceback (most recent call last):
  File "test/test_cursor.py", line 431, in test_limit_and_batch_size
    curs = db.test.find().limit(0).batch_size(10)
  File "pymongo/collection.py", line 828, in find
    return Cursor(self, *args, **kwargs)
  File "pymongo/cursor.py", line 93, in __init__
    self.tb = ''.join(traceback.format_stack())

I caught:
Traceback (most recent call last):
  File "pymongo/cursor.py", line 211, in __del__
    self._mongo_client.close_cursor(self.cursor_id)
  File "pymongo/mongo_client.py", line 908, in close_cursor
    self._topology.open()
  File "pymongo/topology.py", line 58, in open
    with self._lock:

Great, so a test had left a cursor open, and about 30 tests later that cursor's destructor hung waiting for a lock. It only hung in PyPy, so I guessed it had something to do with the differences between CPython's and PyPy's garbage collection systems.

I was doing the dishes that night when my mind's background processing completed a diagnosis. As soon as I thought of it I knew I had the answer, and I wrote a test that proved it the next morning.

The Fix

PyMongo 2's concurrency design is unsophisticated and the fix was easy. I followed the code path that leads from the cursor's destructor and saw two places it could take a lock. First, if it finds that the MongoClient was recently disconnected from the server, it briefly locks it to initiate a reconnect. I updated that code path to give up immediately if the client is disconnected—better to leave the cursor open on the server for 10 minutes than to risk a deadlock.

Second, if the client is not disconnected, the cursor destructor locks the connection pool to check out a socket. Here, there's no easy way to avoid the lock, so I came at the problem from the other side: how do I prevent a GC while the pool is locked? If the pool is never locked at the beginning of a GC, then the cursor destructor can safely lock it. The fix is here, in Pool.reset:

class Pool:
    def reset(self):
        sockets = None
        with self.lock:
            sockets = self.sockets
            self.sockets = set()

        for s in sockets:
            s.close()

This is the one place we allocate data while the pool is locked. Allocating the new set while holding the lock could trigger a garbage collection, which could destroy a cursor, which could attempt to lock the pool again, and deadlock. So I moved the allocation outside the lock:

    def reset(self):
        sockets = None
        new_sockets = set()
        with self.lock:
            sockets = self.sockets
            self.sockets = new_sockets

        for s in sockets:
            s.close()

Now, the two lines of reset that run while holding the lock can't trigger a garbage collection, so the cursor destructor knows it isn't called by a GC that interrupted this section of code.

And what about PyMongo 3? The new PyMongo's concurrency design is much superior, but it spends much more time holding a lock than PyMongo 2 does. It locks its internal Topology class whenever it reads or updates information about your MongoDB servers. This makes the deadlock trickier to fix.

I borrowed a technique from the MongoDB Java Driver: I deferred the job of closing cursors to a background thread. Now, when an open cursor is garbage collected, it doesn't immediately tell the server. Instead, it safely adds its ID to a list. Each MongoClient has a thread that runs once a second checking the list for new cursor IDs. If there are any, the thread safely takes the locks it needs to send the message to the server—unlike the garbage collector, the cursor-cleanup thread cooperates normally with your application's threads when it needs a lock.

What To Expect When You're Expiring

I already knew that a __del__ method:

Now, add a third rule:

  • It must not take a lock.

Weakref callbacks must follow these three rules, too.

The Moral Of The Story Is....

Don't use __del__ if you can possibly avoid it. Don't design APIs that rely on it. If you maintain a library like PyMongo that has already committed to such an API, you must follow the rules above impeccably.


Image: Ouroboros, Michael Maier (1568–1622).

Response to "Asynchronous Python and Databases"

In his excellent article a few weeks ago, "Asynchronous Python and Databases", SQLAlchemy's author Mike Bayer writes: Asynchronous programming is just one potential approach to have on the shelf, and is by no means the one we should be [...]

Tulips

In his excellent article a few weeks ago, "Asynchronous Python and Databases", SQLAlchemy's author Mike Bayer writes:

Asynchronous programming is just one potential approach to have on the shelf, and is by no means the one we should be using all the time or even most of the time, unless we are writing HTTP or chat servers or other applications that specifically need to concurrently maintain large numbers of arbitrarily slow or idle TCP connections (where by "arbitrarily" we mean, we don't care if individual connections are slow, fast, or idle, throughput can be maintained regardless).

This is nicely put. If you are serving very slow or sleepy connections, which must be held open indefinitely awaiting events, async usually scales better than starting a thread per socket. In contrast, if your server application's typical workload is quick requests and responses, async may not be right for it. On the third hand, if it listens on the public Internet a slow loris attack will force it to handle the kind of workload that async is best at, anyway. So you at least need a non-blocking frontend like Nginx to handle slow requests from such an attacker.

And async isn't just for servers. Clients that open a very large number of connections, and await events indefinitely, will scale up better if they are async. This is less commonly required on the client side. But for hugely I/O-bound programs like web crawlers you may start to see an advantage with async.

The general principle is: if you do not control both sides of the socket, one side may be arbitrarily slow. Perhaps maliciously slow. Your side had better be able to handle slow connections efficiently.

But what about your application's connection to your database? Here, you control both sides, and you are responsible for ensuring all database requests are quick. As Mike's tests showed, your application may not spend much time at all waiting for database responses. He tested with Postgres, but a well-configured MongoDB instance is similarly responsive. With a low-latency database your program's raw speed, not its scalability, is your priority. In this case async is not the right answer, at least not in Python: a small thread pool serving low-latency connections is typically faster than an async framework.

I agree with Mike's article, based on my own tests and my discussions with Tornado's author Ben Darnell. As I said at PyCon last year, async minimizes resources per idle connection, while you are waiting for some event to occur in the indefinite future. Its big win is not that it is faster. In many cases it is not.

The strategy Mike seems to advocate is to separate the async API for a database driver from an async implementation for it. In asyncio, for example, it is important that you can read from a database with code like:

@asyncio.coroutine
def my_query_method():
    # "yield from" unblocks the event loop while
    # waiting for the database.
    result = yield from my_db.query("query")

But it is not necessary to reimplement the driver itself using non-blocking sockets and asyncio's event loop. If db.query defers your operation to a thread pool, and injects the result into the event loop on the main thread when it is ready, it might be faster and scales perfectly well for the small number of database connections you need.

So what about Motor, my asynchronous driver for MongoDB and Tornado? With some effort, I wrote Motor to provide an async API to MongoDB for Tornado applications, and to use non-blocking connections to MongoDB with Tornado's event loop. (Motor uses greenlets internally to ease the latter task, but greenlets are beside the point for this discussion.) If Mike Bayer's article is right, and I believe it is, was Motor a waste?

With Motor, I achieved two goals. One was necessary, but I am reconsidering the other. The necessary goal was to provide an async API for Tornado applications that want to use MongoDB; Motor succeeds at this. But I wonder if Motor would not have marginally better throughput if it used a thread pool and blocking sockets, instead of Tornado's event loop, to talk to MongoDB. If I began again, particularly now that the concurrent.futures threadpool is more mainstream, I might use threads instead. It may be possible to gain ten or twenty percent on some benchmarks, and streamline future development too. Later this year I hope to make the time to experiment with the performance and maintainability of that approach for some future version of Motor.

Yangshan Plants His Hoe

An audio recording of this talk is available here. Book of Serenity, Case 15: Yangshan Plants His Hoe. Guishan asked Yangshan, "Where are you coming from?" Yangshan said, "From the fields." Guishan said, "How many people are there in the [...]

Jiryu shuso hossen

An audio recording of this talk is available here.

Book of Serenity, Case 15: Yangshan Plants His Hoe.

Guishan asked Yangshan, "Where are you coming from?"

Yangshan said, "From the fields."

Guishan said, "How many people are there in the fields?"

Yangshan planted his hoe in the ground, clasped his hands and stood there.

Guishan said, "On South Mountain there are a lot of people cutting thatch."

Yangshan took up his hoe and went.

I chose this koan as the subject of my first dharma talk yesterday, at the Village Zendo. The talk capped a week of practice where we examined the triple injustices of homelessness, incarceration, and racism.

So you might ask, why talk about a koan at the end of a week like that? Why study a story about two Chinese monks trading riddles a thousand ago?

I believe this koan is crucial. It is about the purpose of Zen in a world where there is a lot of work to do. But to see why, we have to unpack its meaning.

The koan reminds me of Qingyuan's famous little autobiography. He wrote,

Before I studied Zen, I saw mountains as mountains, and rivers as rivers. When I arrived at a more intimate knowledge, I came to the point where I saw that mountains are not mountains, and rivers are not rivers. But now that I have got its very substance I am at rest. For it's just that I see mountains once again as mountains, and rivers once again as rivers.

That echoes the arc of this koan. Qingyuan took decades to evolve from the conventional, to the absolute, to their synthesis. But in the koan, Guishan and Yangshan leap from one perspective to the next, in just a few sentences.

To begin, Guishan asks an ordinary question. Yangshan gives an ordinary answer. "Where are you coming from?" "From the fields."

Why does Guishan ask, anyway? Yangshan's feet are covered in manure, he is sweaty, he is wearing a muddy, manure-splattered samue. He's not coming from the library. Dogen comments, "His disciple is carrying a hoe. Can it be that he doesn't know where he's coming from?" No, Guishin asks in order to test his student Yangshan. He is finding out how Yangshan practices working. Anyone can till a field with a hoe, but what is Yangshan's Zen of working in the field?

In my opinion, Yangshan's ordinary answer is perfectly acceptable from a Zen man. Being a monk does not mean everything has to be mystical and crazy. Better to just give the facts. If I ask you what time it is, please just tell me. Don't take off your watch and stand in silence. Don't treat everything like a riddle. It would be insufferable to act Zen all the time.

But if acting ordinary is perfectly acceptable, what is the purpose of practice? Practice should change you. Do not misinterpret Nansen's "Ordinary mind is the way"—it is true, but it does not mean practice should not attain an expanded view. It's not that you forget how to give conventional answers, but years of Zen practice should make us ever less limited by the conventional. Ever more liberated. Ever more free.

Guishan tests for this expanded view—does the monk Yangshan just hoe the field the same as he always did, or have his years of practice expanded him? Guishan's test comes as another ordinary question: "How many people are there in the fields?" Are there thirty people in the fields? One person? None? And in response, Yangshan shows he cannot be trapped in the conventional. He leaps up to heaven in an instant and shows Guishan the whole universe. He plants his hoe in the dirt, clasps his hands and stands silently.

This is Yangshan's answer. This is the "no eye ear nose tongue body mind" of the Heart Sutra. It is a completely austere, beautiful emptiness. No hoe, no fields, no people, no Guishan, no Yangshan. It is not that nothing exists in this silence, but we are liberated from categorizing and separating and counting. From alienation. To really be intimate with work, skin to skin with it, you have to sort of go unconscious at times, where the work is doing you. You are one with the work.

Please forgive the hideous clichés. They are clichés because they are fact: our best work is done when we are in the zone, just talking, just hoeing, just thinking, just programming, just writing. It is the central teaching of Zen that the view from heaven that encompasses the whole universe, and the view from the muddy, manurey field where you dig up one turnip at a time, these are the same. You do not have to abandon one of them in order to attain the other. Actually you cannot—turning your eyes from the manure to gaze on heaven is delusion. It is when you are squatting down, up to your wrists and ankles in shit with no thought of yourself at all, totally absorbed, that is heaven.

You do not know it, which is sad news for the knowing part of us which wants to own heaven. That one must be silent for a moment. We do not know when it is happening, and as soon as we know, it isn't. It's annoying. All the same, heaven is available to us when we are absorbed in our work.

But does that mean we cannot ever answer an ordinary question again? Zen students walk the Bodhisattva path, which means we are committed to being effective, and that means we have to act, make distinctions, plan ahead, handle details. Austere, oceanic silence doesn't cut it. Silence is always the same, it is not an effective response to the changing world. It is like a VCR blinking "12:00".

So Guishan checks Yangshan once more. Yangshan has shown that he is master of the obvious when he said, "I came from the fields," and then he showed that he is the master of the absolute, too, when he released his hoe and stood in silence. Is he stuck in the absolute, or is he free? Guishan checks him: "On South Mountain there are a lot of people cutting thatch." So Yangshan picks up his hoe and goes.

Guishan is just answering his own question again. "How many people are in the fields?" Work is happening, is the answer. If you are not attached to your work versus someone else's work, the work now versus the work later, the work here versus the work over there, if work is just work is just work, well, the thatch being cut on South Mountain is just the work. How many people are there in the fields? Workers of the world unite!

And how does Yangshan respond to this? He picks up his hoe and goes off.

It's fun, momentarily, to think about a couple interpretations of what just happened. One is, no one is working. When Yangshan is really doing it, with the sweat dripping off his head, then there is no Yangshan hoeing. What he shows is not being there at all. So it might be another riposte in his dharma combat with Guishan. And a very stylish one, too: Yangshan's final move is to delete himself from the dialog.

But I think he goes off to cut thatch. Thomas Cleary's translation says "he went", but another translation I found says "he left immediately." There is a duty to join in at once. When there is thatch to be cut on South Mountain, why is Yangshan still standing around doing dharma combat? That is not what a Zen monk actually does—a monk goes to work.

Hongzhi, the compiler of the Book of Serenity, writes this verse:

The old enlightened one's feelings are many, he thinks of his descendants.
Now he repents of setting up a household.
We should remember the saying about South Mountain,
Engraved on the bones, inscribed on the skin, together requiting the blessing.

It is such a grave responsibility, the Bodhisattva path. There is so much work to do in the world. We spent last week tasting, a little more, the bitterness of all the injustice and suffering in NYC. We heard from Genro Roshi about homelessness, and we saw the huge need of the community that the Bowery Mission serves, all the effort that is required, and how it still is not a fraction of the need. We went to the NYC Criminal Court, we saw that system's mouth ingesting people, chewing them up, one after another after another, we saw people having a really bad day. Many of them have a lot of bad years to come. We confronted the racism in our society, in ourselves, how stuck our nation is in the sin of our founders, how we have never healed the wounds we made.

Your little self is not up to the challenge. That is why we must touch Yangshan's huge silence. Yangshan's silence is big enough. It is the whole shebang. Or, put less loftily, when you lose yourself in your work, your judgment and self-doubt fades and you are free to act boldly. That is the point of training. Sitting still and staring at a wall is activism. It is not enough on its own—that is the point of this koan—but it is excellent training. That is why Bodhisattvas practice zazen. But then, when someone says there is work to be done, we pick up our tools and go.

The old enlightened one's feelings are many, he thinks of his descendants.
Now he repents of setting up a household.

The old enlightened one could be Guishan, but it sounds to me like Buddha.

When Buddha was enlightened, he did not want to teach. He just hung out for a week, blissed out, enjoying his enlightenment. Hanging out in silence. But the god Indra convinced him to teach—that is, to start a religion. This is the household he set up.

I often talk with non-Buddhists who say, "I know it's not really a religion, it's more of a lifestyle or a philosophy." But that does not describe the Zen I practice.

Several of our adored sangha members were sick this month, and our sangha is rallying to visit them and share news about them. That does not sound like a lifestye to me. Crate & Barrel is a lifestyle: it has a catalog, but it isn't there for you when you're sick. A religious community is. A philosophy has books and theories, but it does not form a sangha. Existentialism doesn't check in on you when you're in the hospital. But a Zendo does. That is the household we set up for ourselves. It protects us.

And what about the end of Hongzhi's verse?

We should remember the saying about South Mountain,
Engraved on the bones, inscribed on the skin, together requiting the blessing.

Don't settle for a superficial Zen. It is not robes and bells, or a pretty Japanese-style Zendo, or memorizing all the chants. I love all this stuff, but it is just the box that the dharma comes in. It needs a box, but the box is not it. It has to be indelible, incarnate. Then the dharma is visiting a sick friend in the hospital. It is healing injustice, housing the poor, confronting hatred, our ancient, evil karma.

This practice has been handed down for over a thousand years. The way that we practice is the way that ancient Chinese monks like Guishan and Yangshan did. They blessed us with this religion, this warm household where we can practice together, expand ourselves to take on the work that needs to be done to heal the world's wounds. Let us continue it together and hand it down to our descendents. Together requiting the blessing.


Yangshan Plants His Hoe: Book of Serenity by Thomas Cleary, Shambhala Press 2005.

Qingyuan's saying: Essays in Zen Buddhism by D. T. Suzuki, Grove Press 1961.

Dogen's comment: The True Dharma Eye by Kazuaki Tanahashi and John Daido Loori, Shambhala Press 2005.

Zazenkai

Today at the Village Zendo was a simple one-day meditation retreat. It's by far the largest we've had: 45 people sitting in our little zendo. Somehow we arranged the room to accommodate everyone, and lunch was served very efficiently. The [...]

Zazenkai

Today at the Village Zendo was a simple one-day meditation retreat. It's by far the largest we've had: 45 people sitting in our little zendo. Somehow we arranged the room to accommodate everyone, and lunch was served very efficiently.

The one-day retreat caps the week that includes Urban Sesshin and ends with my first dharma talk, concluding this winter's intensive practice period.

Our poor abbot Enkyo Roshi has the flu so her dharma talk was canceled. Instead we had a rare treat: mondo, spontaneous Zen dialogs between our teacher Joshin and the congregation. Some schools within Zen have public question-and-answer sessions between teachers and students quite often. For example, at the Austin Zen Center where I began practice, we had mondo once a month with the abbot, and it's clear from books like Mind of Clover, Dropping Ashes on the Buddha, and Cave of Tigers that regular mondo is common in Zen sanghas. I wish the Village Zendo did it more often.

In any case, we'll have a mondo tomorrow: Tomorrow I give my first dharma talk and the sangha will test me in a form of mondo called "dharma combat."

Urban Sesshin Day 3: Diversity And Racism

This was the final day of the three-day Urban Sesshin I'm leading for the Village Zendo. The first day we had a dharma talk by Genro Roshi, and ate lunch at the Bowery Mission. Yesterday we had a talk by Ryotan Sensei, the leader of our [...]

Merle Kodo Boyd

This was the final day of the three-day Urban Sesshin I'm leading for the Village Zendo. The first day we had a dharma talk by Genro Roshi, and ate lunch at the Bowery Mission. Yesterday we had a talk by Ryotan Sensei, the leader of our meditation program at Sing Sing, and we observed arraignments at the NYC Criminal Court. Today, we heard a dharma talk by Merle Kodo Boyd Sensei and a workshop on diversity and racism with Tiffany Taylor Smith.

The workshop, "People Talking Culture", was a three-hour whirlwind of topics covering prejudice, oppression, diversity, and inclusion. We shared our cultural stories with each other—it was particularly informative how difficult it was for the white members of our sangha to identify the moment they discovered they were white. The Jews among us could often remember the moment when we discovered we were Jewish, as opposed to members of the mainstream religious culture, but determining the day we knew we were white was much more difficult.

We also discussed micro-aggressions, a topic which makes me frankly paranoid. If, practically by definition, micro-aggressions are unconscious acts, how do I know I'm not constantly committing them? The helpful answer is to be awake. With attention and luck, I can see the effect my words and actions have on others, and I'll know by those signs when I've hurt someone.

Kodo Sensei's talk was a 90-minute tour de force. She covered three koans about intimacy, and described how intimacy was a prerequisite for hatred—there is no relationship more intimate than between slave and master—and is a prerequisite for reconciliation. She described the racial context of her upbringing in small-town Texas, and talked about practicing with her anger about racism and America's awful legacy.

Tomorrow we do a one-day silent meditation retreat with a talk by Roshi, and Sunday morning is my big show.


Photo (c) James Salzano.

Urban Sesshin Day 2: NYC Criminal Court

This was the second day of the three-day Urban Sesshin I'm leading for the Village Zendo. Yesterday we had a dharma talk by Genro Roshi, and ate lunch at the Bowery Mission. Today had a long and detailed dharma talk from Ryotan Sensei, who [...]

NYC Criminal Courts Building

This was the second day of the three-day Urban Sesshin I'm leading for the Village Zendo. Yesterday we had a dharma talk by Genro Roshi, and ate lunch at the Bowery Mission. Today had a long and detailed dharma talk from Ryotan Sensei, who leads our meditation program at Sing Sing. He described the explosion of the United States's prison population, which has only recently begun to wane from its peak. He noted that, even though forces as powerful and diverse as the Center for American Progress and the Koch Brothers support reduced sentencing, mass incarceration is still popular, and the American voter does not support radical reform.

Ryotan has asked our members in Sing Sing what they want the New York sangha to know about imprisonment, and they have given us three messages: that there are innocent people in prison—although most of our own members do not claim innocence—that the parole boards are capricious and unjust, and that prisoners are at the mercy of a handful of violent guards, whose abuses the other guards do little to restrain.

We visited the New York City Criminal Courts and watched the arraignments there. Some sangha members were struck by the suffering of the accused, who were mostly young black people in handcuffs, and the indifference of the cops, judges, prosecutors, even the public defenders to the suffering and humanity of the people they were processing. But one of my friends noticed how intensely the public defenders met the gaze of each client, how carefully they listened to their stories, in the minutes before they presented their cases before the judge.

Me, I thought the process was surprisingly professional and efficient. And whenever defendants pled guilty, the judge was very careful to determine that they understood the consequences. It seemed like we were witnessing, not the actual site of injustice, but just a workaday way-station in the journey of New York's poor and minorities from their impoverished communities to their future of oppressive incarceration.

Tomorrow we have a dharma talk from Merle Kodo Boyd Sensei, and a workshop on racism and diversity from Tiffany Taylor Smith.

Urban Sesshin Day 1: Bowery Mission

Today was the first day of the three-day Urban Sesshin I'm leading for the Village Zendo this year. We sat a few hours of meditation this morning, then Genro Roshi came to give us a dharma talk about homelessness. The statistics in New York [...]

Bowery Mission

Today was the first day of the three-day Urban Sesshin I'm leading for the Village Zendo this year. We sat a few hours of meditation this morning, then Genro Roshi came to give us a dharma talk about homelessness. The statistics in New York are apocalyptic: there are now about 60,000 people in shelters on a given night, doubled in less than ten years. And no matter how comfortable we may feel, we who have apartments and houses, we are all nevertheless homeless. All our security and possessions are temporary. If we suppress this knowledge it cuts us off from the beggars we see (or refuse to see) daily, but if we acknowledge that the difference between the comfortable person and the beggar is temporary, we can meet each person with respect.

We ate lunch at the Bowery Mission. I was delighted at how inconvenient and messy it was for us. We didn't receive the usual Baptist sermon that precedes a meal at the Mission. Instead, today, the pre-meal event was a couple of Christian comedians. The first was shy and apologetic, and not very funny. The second was combative, sparring with the audience, mocking the homeless guy falling asleep on the bench in front of him. He wasn't much funnier.

Lunch was delayed, so we sat in the pews listening to the comedians for long over an hour. After lunch we were given a tour by a resident of the Mission. He was sweet and genuine, very passionate about what the Mission's program had done for him, but he didn't have much time for us—there was a gas leak somewhere in the building that set alarms going off every few minutes, and he said again and again how "hectic" things were today.

If we'd had an easy, well-run visit of the Mission it wouldn't have been real. What we had instead is a little loss of control, a sense of what it's like to depend on charitable institutions for one's needs.

Tomorrow, we go to the NYC Criminal Court and watch arraignments.

PyMongo And Key Order In Subdocuments

Or, "Why does my query work in the shell but not PyMongo?" Variations on this question account for a large portion of the Stack Overflow questions I see about PyMongo, so let me explain once for all. MongoDB stores documents in a binary [...]

Or, "Why does my query work in the shell but not PyMongo?"

Variations on this question account for a large portion of the Stack Overflow questions I see about PyMongo, so let me explain once for all.

MongoDB stores documents in a binary format called BSON. Key-value pairs in a BSON document can have any order (except that _id is always first). The mongo shell preserves key order when reading and writing data. Observe that "b" comes before "a" when we create the document and when it is displayed:

> // mongo shell.
> db.collection.insert( {
...     "_id" : 1,
...     "subdocument" : { "b" : 1, "a" : 1 }
... } )
WriteResult({ "nInserted" : 1 })
> db.collection.find()
{ "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } }

PyMongo represents BSON documents as Python dicts by default, and the order of keys in dicts is not defined. That is, a dict declared with the "a" key first is the same, to Python, as one with "b" first:

>>> print {'a': 1.0, 'b': 1.0}
{'a': 1.0, 'b': 1.0}
>>> print {'b': 1.0, 'a': 1.0}
{'a': 1.0, 'b': 1.0}

Therefore, Python dicts are not guaranteed to show keys in the order they are stored in BSON. Here, "a" is shown before "b":

>>> print collection.find_one()
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

To preserve order when reading BSON, use the SON class, which is a dict that remembers its key order. First, get a handle to the collection, configured to use SON instead of dict. In PyMongo 3.0 do this like:

>>> from bson import CodecOptions, SON
>>> opts = CodecOptions(as_class=SON)
>>> opts
CodecOptions(as_class=<class 'bson.son.SON'>,
             tz_aware=False,
             uuid_representation=PYTHON_LEGACY)
>>> collection_son = collection.with_options(codec_options=opts)

Now, documents and subdocuments in query results are represented with SON objects:

>>> print collection_son.find_one()
SON([(u'_id', 1.0), (u'subdocument', SON([(u'b', 1.0), (u'a', 1.0)]))])

The subdocument's actual storage layout is now visible: "b" is before "a".

Because a dict's key order is not defined, you cannot predict how it will be serialized to BSON. But MongoDB considers subdocuments equal only if their keys have the same order. So if you use a dict to query on a subdocument it may not match:

>>> collection.find_one({'subdocument': {'a': 1.0, 'b': 1.0}}) is None
True

Swapping the key order in your query makes no difference:

>>> collection.find_one({'subdocument': {'b': 1.0, 'a': 1.0}}) is None
True

... because, as we saw above, Python considers the two dicts the same.

There are two solutions. First, you can match the subdocument field-by-field:

>>> collection.find_one({'subdocument.a': 1.0,
...                      'subdocument.b': 1.0})
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The query matches any subdocument with an "a" of 1.0 and a "b" of 1.0, regardless of the order you specify them in Python or the order they are stored in BSON. Additionally, this query now matches subdocuments with additional keys besides "a" and "b", whereas the previous query required an exact match.

The second solution is to use a SON to specify the key order:

>>> query = {'subdocument': SON([('b', 1.0), ('a', 1.0)])}
>>> collection.find_one(query)
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The key order you use when you create a SON is preserved when it is serialized to BSON and used as a query. Thus you can create a subdocument that exactly matches the subdocument in the collection.

For more info, see the MongoDB Manual entry on subdocument matching.