A. Jesse Jiryu Davis

Refactoring Tornado Coroutines

[Source] Sometimes writing callback-style asynchronous code with Tornado is a pain. But the real hurt comes when you want to refactor your async code into reusable subroutines. Tornado's coroutines make refactoring easy. I'll [...]

Tornado [Source]

Sometimes writing callback-style asynchronous code with Tornado is a pain. But the real hurt comes when you want to refactor your async code into reusable subroutines. Tornado's coroutines make refactoring easy. I'll explain the rules.

(This article updates my old "Refactoring Tornado Code With gen.engine". The updated code here demonstrates the current syntax for Tornado 3 and Motor 0.3.)

For Example

I'll use this blog to illustrate. I built it with Motor-Blog, a trivial blog platform on top of Motor, my asynchronous MongoDB driver for Tornado.

When you came here, Motor-Blog did three or four MongoDB queries to render this page.

1: Find the blog post at this URL and show you this content.

2 and 3: Find the next and previous posts to render the navigation links at the bottom.

Maybe 4: If the list of categories on the left has changed since it was last cached, fetch the list.

Let's go through each query and see how Tornado coroutines make life easier.

Fetching One Post

In Tornado, fetching one post takes a little more work than with blocking-style code:

db = motor.MotorClient().my_blog_db

class PostHandler(tornado.web.RequestHandler):
    @tornado.asynchronous
    def get(self, slug):
        db.posts.find_one({'slug': slug}, callback=self._found_post)

    def _found_post(self, post, error):
        if error:
            raise tornado.web.HTTPError(500, str(error))
        elif not post:
            raise tornado.web.HTTPError(404)
        else:
            self.render('post.html', post=post)

Not so bad. But is it better with a coroutine?

class PostHandler(tornado.web.RequestHandler):
    @gen.coroutine
    def get(self, slug):
        post = yield db.posts.find_one({'slug': slug})
        if not post:
            raise tornado.web.HTTPError(404)

        self.render('post.html', post=post)

Much better. If you don't pass a callback to find_one, then it returns a Future instance. A Future is nothing special, it's just a little object that represents an unresolved value. Some time hence, Motor will resolve the Future with a value or an exception. To wait for the Future to be resolved, yield it.

The yield statement makes this function a generator. gen.coroutine is a brilliant invention that runs the generator until it's complete. Each time the generator yields a Future, gen.coroutine schedules the generator to be resumed when the Future is resolved. Read the source code of the Runner class for details, it's exhilarating. Or just enjoy the glow of putting all your logic in a single function again, without defining any callbacks.

Even better, you get normal exception handling: if find_one gets a network error or some other failure, it raises an exception. Tornado knows how to turn an exception into an HTTP 500, so we no longer need special code for errors.

This coroutine is much more readable than a callback, but it doesn't look any nicer than multithreaded code. It will start to shine when you need to parallelize some tasks.

Fetching Next And Previous

Once Motor-Blog finds the current post, it gets the next and previous posts so it can display their titles. Since the two queries are independent we can save a few milliseconds by doing them in parallel. How does this look with callbacks?

@tornado.asynchronous
def get(self, slug):
    db.posts.find_one({'slug': slug}, callback=self._found_post)

def _found_post(self, post, error):
    if error:
        raise tornado.web.HTTPError(500, str(error))
    elif not post:
        raise tornado.web.HTTPError(404)
    else:
        _id = post['_id']
        self.post = post

        # Two queries in parallel.
        # Find the previously published post.
        db.posts.find_one(
            {'pub_date': {'$lt': post['pub_date']}}
            sort=[('pub_date', -1)],
            callback=self._found_prev)

        # Find subsequently published post.
        db.posts.find_one(
            {'pub_date': {'$gt': post['pub_date']}}
            sort=[('pub_date', 1)],
            callback=self._found_next)

def _found_prev(self, prev_post, error):
    if error:
        raise tornado.web.HTTPError(500, str(error))
    else:
        self.prev_post = prev_post
        if self.next_post:
            # Done
            self._render()

def _found_next(self, next_post, error):
    if error:
        raise tornado.web.HTTPError(500, str(error))
    else:
        self.next_post = next_post
        if self.prev_post:
            # Done
            self._render()

def _render(self)
    self.render(
        'post.html',
        post=self.post,
        prev_post=self.prev_post,
        next_post=self.next_post)

This is completely disgusting and it makes me want to give up on async. We need special logic in each callback to determine if the other callback has already run or not. All that boilerplate can't be factored out. Will a coroutine help?

@gen.coroutine
def get(self, slug):
    post = yield db.posts.find_one({'slug': slug})
    if not post:
        raise tornado.web.HTTPError(404)
    else:
        future_0 = db.posts.find_one(
            {'pub_date': {'$lt': post['pub_date']}}
            sort=[('pub_date', -1)])

        future_1 = db.posts.find_one(
            {'pub_date': {'$gt': post['pub_date']}}
            sort=[('pub_date', 1)])

        prev_post, next_post = yield [future_0, future_1]
        self.render(
            'post.html',
            post=post,
            prev_post=prev_post,
            next_post=next_post)

Yielding a list of Futures tells the coroutine to wait until they are all resolved.

Now our single get function is just as nice as it would be with blocking code. In fact, the parallel fetch is far easier than if you were multithreading instead of using Tornado. But what about factoring out a common subroutine that request handlers can share?

Fetching Categories

Every page on my blog needs to show the category list on the left side. Each request handler could just include this in its get method:

categories = yield db.categories.find().sort('name').to_list(10)

But that's terrible engineering. Here's how to factor it into a coroutine:

@gen.coroutine
def get_categories(db):
    categories = yield db.categories.find().sort('name').to_list(10)
    raise gen.Return(categories)

This coroutine does not have to be part of a request handler—it stands on its own at the module scope.

The raise gen.Return() statement is the weirdest syntax in this example. It's an artifact of Python 2, in which generators aren't allowed to return values. To hack around this limitation, Tornado coroutines raise a special kind of exception called a Return. The coroutine catches this exception and treats it like a returned value. In Python 3, a simple return categories accomplishes the same result.

To call my new coroutine from a request handler, I do:

class PostHandler(tornado.web.RequestHandler):
    @gen.coroutine
    def get(self, slug):
        categories = yield get_categories(db)
        # ... get the current, previous, and
        # next posts as usual, then ...
        self.render(
            'post.html',
            post=post,
            prev_post=prev_post,
            next_post=next_post,
            categories=categories)

Since get_categories is a coroutine now, calling it returns a Future. To wait for get_categories to complete, the caller can yield the Future. Once get_categories completes, the Future it returned is resolved, so the caller resumes. It's almost like a regular function call!

Now that I've factored out get_categories, it's easy to add more logic to it. This is nice because I want to cache the categories between page views. get_categories can be updated very simply to use a cache:

categories = None

@gen.coroutine
def get_categories(db):
    global categories
    if not categories:
        categories = yield db.categories.find().sort('name').to_list(10)

    raise gen.Return(categories)

(Note for nerds: I invalidate the cache whenever a post with a new category is added. The "new category" event is saved to a capped collection in MongoDB, which all the Tornado servers are always tailing. This is a simple way to use MongoDB as an event queue, which the multiple Tornado processes use to communicate with each other.)

Conclusion

Tornado's excellent documentation shows briefly how a method that makes a few async calls can be simplified using gen.coroutine, but the power really comes when you need to factor out a common subroutine. There are only three steps:

  1. Decorate the subroutine with @gen.coroutine.
  2. In Python 2, the subroutine returns its result with raise gen.Return(result).
  3. Call the subroutine from another coroutine like result = yield subroutine().

That's all there is to it. Tornado's coroutines make asynchronous code efficient, clean—even beautiful.

Motor 0.3 Released

Today I released Motor 0.3. This version has no new features compared to Motor 0.2.1. Here's what I changed: I updated the PyMongo dependency from 2.7 to 2.7.1, therefore inheriting PyMongo 2.7.1’s bug fixes. Motor continues to [...]

Motor

Today I released Motor 0.3. This version has no new features compared to Motor 0.2.1. Here's what I changed:

  • I updated the PyMongo dependency from 2.7 to 2.7.1, therefore inheriting PyMongo 2.7.1’s bug fixes.
  • Motor continues to support Python 2.6, 2.7, 3.3, and 3.4, but now with single-source. 2to3 no longer runs during installation with Python 3.
  • nosetests is no longer required for regular Motor tests.
  • I fixed a mistake in the docs for aggregate().

Rewriting Motor to support Python 2 and 3 in the same source code makes life sane for me, and it reflects the current consensus about the best way to write portable Python. It wasn't terribly difficult either.

Now that I've simplified Motor's Python 3 support, I'm ready to tackle the next big challenge: I want to see if Motor can support Twisted and asyncio, in addition to Tornado. Wish me luck.

The Aura of the Live Demo

A live demo is too difficult. Too risky. On speaking.io, Zach Holman tells you that "live demos are like Global Thermonuclear War, the only way to win is to not do a live demo." So why bother doing one? Showing a video is reliable and easy, and [...]

A live demo is too difficult. Too risky. On speaking.io, Zach Holman tells you that "live demos are like Global Thermonuclear War, the only way to win is to not do a live demo." So why bother doing one? Showing a video is reliable and easy, and just as good. Right?

When you show a video, you lose something vital. There's a reason people still do live demos, even though we all know better. The reason is that a live demo is live.

This liveness is particularly effective if your audience is programmers like me. I have the traits of a scientist and an engineer: Like a scientist, I'm skeptical, and like an engineer I love to make things go.

Because I'm skeptical, I want proof. If you tell me what your code does, I want to see your code actually do it. It's not that I think you're lying, I just want your experiment reproduced in front of me, so I can verify it with the evidence of my senses. Until then, the scientist in me doesn't think I've done my job.

A few years ago I gave my first big talk, an introduction to MongoDB replica sets. It was at a conference in Atlanta, with an audience of a hundred. I was very nervous, but I was determined to do a demo. I must have practiced it fifty times before I did it live: I spun up a three-node replica set, I killed the primary node, and the surviving nodes elected a new primary. Abracadabra! At the end of the talk, someone asked, "I read somewhere that three nodes isn't enough to provide fault tolerance?" To this day I have no idea where he read that. But I was happy I could say, in front of the audience, "A three-node replica set can survive the loss of one node. You don't have to take my word for it—I've shown you."

I want proof, like most programmers, and I also want to make things go. I'm Doctor Frankenstein: I'm obsessed with creating something that is alive. The first time I made a turtle draw on the screen, the first time I made the computer go "beep", I fell in love. So, when I see you make the machine go, I'm entranced. You press a button and the machine is doing something, it is acting in the world. It's alive! A video of something the machine did in the past is no substitute for its activity in the room now.

In "The Work of Art in the Age of Mechanical Reproduction", Walter Benjamin distinguishes between original art and copies:

Even the most perfect reproduction of a work of art is lacking in one element: its presence in time and space, its unique existence at the place where it happens to be.

Benjamin calls this element, this thing that's lost when art is copied, its "aura." He imagines that the first use of art was in ritual. Back then, art was valuable because it was magic. The animals that Stone Age people painted in caves were instruments of magic, he thinks. A copy of a work of art has no magic power. It is separated from its ritual use, and so its only remaining value is aesthetic. "This permits the audience to take the position of a critic."

So, too, when you show me a video of your demo. I can appreciate your video aesthetically, if it's beautiful. But you don't want me to have critical distance: you want to be a magician. You want to perform the ritual in front of me and entrance me. You press the button, and the magic happens.

This is why people like Bill Gates and Steve Jobs have shown live demos instead of canned ones. They want to be magicians. The risk is great: Windows 98 blue-screened when Bill Gates demonstrated it, and Steve Jobs couldn't get his iPhone 4 online. If you're going to do a live demo you need a better backup plan than they had. And you need to practice like crazy. But the experience of a live demo cannot be matched. The magic only happens when the machine is doing something now, in the room. Don't you want to be a magician?

Motor 0.2.1 Released

Version 0.2.1 of Motor, the asynchronous MongoDB driver for Python and Tornado, has been released. It fixes two bugs: MOTOR-32: The documentation claimed that MotorCursor.close immediately halted execution of MotorCursor.each, [...]

Motor

Version 0.2.1 of Motor, the asynchronous MongoDB driver for Python and Tornado, has been released. It fixes two bugs:

  • MOTOR-32: The documentation claimed that MotorCursor.close immediately halted execution of MotorCursor.each, but it didn't. MotorCursor.each() is now halted correctly.
  • MOTOR-33: An incompletely iterated cursor's __del__ method sometimes got stuck and cost 100% CPU forever, even though the application was still responsive.

The manual is on ReadTheDocs. If you find a bug or want a feature, I exhort you to report it.

PyCon APAC 2014 recap

[Source] Thanks to the miracle of satellite Internet, I'm posting from a plane over the Pacific. My cramped schedule prohibited me from visiting Taipei as long as I'd like: this trip comprised three days in the city and two days on planes. [...]

ShiLin [Source]

Thanks to the miracle of satellite Internet, I'm posting from a plane over the Pacific. My cramped schedule prohibited me from visiting Taipei as long as I'd like: this trip comprised three days in the city and two days on planes. But the exuberant city, and the sincerity of the conference organizers' efforts, made it worthwhile.

I delivered a half-length version of my PyCon talk on async in the morning, to an audience slightly overflowing the room. When I was deciding how to cut the talk, I made the painful choice to cut the code and keep the analogies. And once again the analogies were real winners: lots of laughter when I started talking about sandwiches and pizza.

PyCon APAC was held at Academia Sinica, a research institute. Being in an academic setting gave me two big boosts as a speaker: lecture rooms and young people.

The lecture rooms are actually designed to help the speaker and audience stay connected. In contrast, the giant rooms in convention centers are designed to be usable for anything but good for nothing. (The room I last spoke in, at PyCon in Montréal, would serve best for assembling aircraft.) But the Academia Sinica rooms are purpose-built. As Scott Berkun writes, "the ideal room for a lecture is a theater. It's crazy, I know, but we solved most lecture-room problems about 2,000 years ago. The Greek amphitheater gets it all just about right, provided it doesn't rain." What a friendly feeling, to be surrounded by the audience and to see everyone's faces.

The audience was generally university students or professionals early in their careers. They came ready to learn. Plus, they're excited when Western open source programmers make the trip to meet them: there are fewer open source leaders in Asia (except perhaps Japan) and the area isn't saturated with conferences.

In the afternoon I gave my second talk, a new one I wrote this week. I guess people liked my async talk and came back for seconds: we overflowed the room so badly that latecomers could not wedge themselves through the door. I told a story about how a blogger complained that PyMongo was slow, and what tools I used to prove the blogger wrong. Huge laughs, the most fun I've had speaking.

Jesse pycon apac [Source]

My colleague Amalia Hawkins delivered her "Narrowing the Gender Gap at Hackathons" talk to general acclaim. Hackathons aren't yet the phenomenon in Asia that they are in the US, so there's a chance to start things right. Amalia's thesis is that focusing on the experience of all hackathon newcomers benefits everyone, and narrows the gender gap as a side effect.

Fernando Perez and Wes McKinney gave inspiring keynotes about their numerical Python tools, IPython and Pandas respectively. I'm severely ignorant about numerical Python, so I appreciated learning from experts. Jessica McKellar's keynote invited us to expand Python's reach among groups who don't feel welcome in the open source community.

Besides the conference, all I remember about Taipei is a continuous blur of food. The university cafeteria served weird delicious Chinese vegetables, sour eggplant, seitan, pieces of seaweed tied into bows. All for a dollar.

Amalia and I ate strange things for dinner at night markets. Grilled cuttlefish. Enoki mushrooms wrapped in bacon. One of the food stands made hotdogs, except the bun was replaced with a big sausage, so it was like a meta-hotdog. Amalia had two scoops of taro-root ice cream that had peanut brittle shaved onto them with a carpenter's lathe, and wrapped up into a crêpe like an ice-cream burrito.

We found a food cart in a grimy back alley that made ramen for 50 cents, with the freshest, chewiest noodles I've ever tasted.

PyCon in Taipei!

I got back from Street Retreat on Sunday, and tomorrow I fly to Taipei. Why in the world do I overschedule myself like this? Nevertheless I'm excited to visit Taiwan for the first time and to speak at PyCon APAC. I'll give a shorter version of [...]

Taipei Rushhour birdseye

I got back from Street Retreat on Sunday, and tomorrow I fly to Taipei. Why in the world do I overschedule myself like this?

Nevertheless I'm excited to visit Taiwan for the first time and to speak at PyCon APAC. I'll give a shorter version of the talk I gave at PyCon in Montreal: "What Is Async, How Does It Work, And When Should I Use It?"

I'll also give a new talk on "Python Profiling: The Guts and The Glory." This isn't your regular old Python profiling talk. The regular old talk shows you cProfile, admits that its output is unreadable, and wishes you the best of luck. My talk will tell a story of drama and intrigue, introduce you to a powerful Python profiler called Yappi, show you how to visualize its output with KCacheGrind, and even delve into how CPython profilers actually work.

Motor 0.2 Released

Version 0.2 of Motor, the asynchronous MongoDB driver for Python and Tornado, has been released. This release is compatible with MongoDB 2.6 and PyMongo 2.7. It dramatically improves interoperability with Tornado coroutines, [...]

Motor

Version 0.2 of Motor, the asynchronous MongoDB driver for Python and Tornado, has been released.

This release is compatible with MongoDB 2.6 and PyMongo 2.7. It dramatically improves interoperability with Tornado coroutines, includes support for non-blocking DNS, and adds numerous smaller features.

Links:

If you encounter any issues, please file them in our bug tracker.

That's it! With the Motor release behind me, I'm looking forward to enjoying PyCon, and talking about async at 3:15pm tomorrow.

Announcing Motor 0.2 release candidate

I'm excited to offer you Motor 0.2, release candidate zero. Motor is my non-blocking driver for MongoDB and Tornado. The changes from Motor 0.1 to 0.2 are epochal. They were motivated primarily by three events: Motor wraps PyMongo, and [...]

Motor

I'm excited to offer you Motor 0.2, release candidate zero. Motor is my non-blocking driver for MongoDB and Tornado.

The changes from Motor 0.1 to 0.2 are epochal. They were motivated primarily by three events:

  • Motor wraps PyMongo, and PyMongo has improved substantially.
  • MongoDB 2.6 is nearly done, and Motor has added features to support it.
  • Tornado's support for coroutines and for non-blocking DNS has improved, and Motor 0.2 takes advantage of this.

Please read the changelog before upgrading. There are backwards-breaking API changes; you must update your code. I tried to make the instructions clear and the immediate effort small. A summary of the changes is in my post, "the road to 0.2".

Once you're done reading, upgrade:

pip install pymongo==2.7
pip install https://github.com/mongodb/motor/archive/0.2rc0.zip

The owner's manual is on ReadTheDocs. At the time of this writing, Motor 0.2's docs are in the "latest" branch:

http://motor.readthedocs.org/en/latest/

...and Motor 0.1's docs are in "stable":

http://motor.readthedocs.org/en/stable/

Enjoy! If you find a bug or want a feature, report it. If I don't hear of any bugs in the next week I'll make the release official.

In any case, tweet me if you're building something nifty with Motor. I want to hear from you.

PyMongo 2.7 Has Shipped

Source: inrideo on Flickr I announce with satisfaction that we've released PyMongo 2.7, the successor to PyMongo 2.6.3. The bulk of the driver's changes are to support MongoDB 2.6, which is currently a release candidate. The newest [...]

Amethystine scrub python

Source: inrideo on Flickr

I announce with satisfaction that we've released PyMongo 2.7, the successor to PyMongo 2.6.3. The bulk of the driver's changes are to support MongoDB 2.6, which is currently a release candidate. The newest MongoDB has an enhanced wire protocol and some big new features, so PyMongo 2.7 is focused on supporting it. However, the driver still supports server versions as old as 1.8.

Read my prior post for a full list of the features and improvements in PyMongo. Since I wrote that, we've fixed some compatibility issues with MongoDB 2.6, dealt with recent changes to the nose and setuptools packages, and made a couple memory optimizations.

Motor 0.2 is about to ship, as well. I'll give the details in my next post.

What's next for PyMongo? We now embark on a partial rewrite, which will become PyMongo 3.0. The next-generation driver will delete many deprecated APIs: safe will disappear, since it was deprecated in favor of w=1 years ago. Connection will walk off into the sunset, giving way to MongoClient. We'll make a faster and more thread-safe core for PyMongo, and we'll expose a clean API so Motor and ODMs can wrap PyMongo more neatly.

We'll discard PyMongo's current C extension for BSON-handling. We'll replace it with libbson, a common codec that our C team is building. If you're handling BSON in PyPy, we aim to give you a much faster pure-Python codec there, too.