A. Jesse Jiryu Davis

The Aura of the Live Demo

A live demo is too difficult. Too risky. On speaking.io, Zach Holman tells you that "live demos are like Global Thermonuclear War, the only way to win is to not do a live demo." So why bother doing one? Showing a video is reliable and easy, and [...]

A live demo is too difficult. Too risky. On speaking.io, Zach Holman tells you that "live demos are like Global Thermonuclear War, the only way to win is to not do a live demo." So why bother doing one? Showing a video is reliable and easy, and just as good. Right?

When you show a video, you lose something vital. There's a reason people still do live demos, even though we all know better. The reason is that a live demo is live.

This liveness is particularly effective if your audience is programmers like me. I have the traits of a scientist and an engineer: Like a scientist, I'm skeptical, and like an engineer I love to make things go.

Because I'm skeptical, I want proof. If you tell me what your code does, I want to see your code actually do it. It's not that I think you're lying, I just want your experiment reproduced in front of me, so I can verify it with the evidence of my senses. Until then, the scientist in me doesn't think I've done my job.

A few years ago I gave my first big talk, an introduction to MongoDB replica sets. It was at a conference in Atlanta, with an audience of a hundred. I was very nervous, but I was determined to do a demo. I must have practiced it fifty times before I did it live: I spun up a three-node replica set, I killed the primary node, and the surviving nodes elected a new primary. Abracadabra! At the end of the talk, someone asked, "I read somewhere that three nodes isn't enough to provide fault tolerance?" To this day I have no idea where he read that. But I was happy I could say, in front of the audience, "A three-node replica set can survive the loss of one node. You don't have to take my word for it—I've shown you."

I want proof, like most programmers, and I also want to make things go. I'm Doctor Frankenstein: I'm obsessed with creating something that is alive. The first time I made a turtle draw on the screen, the first time I made the computer go "beep", I fell in love. So, when I see you make the machine go, I'm entranced. You press a button and the machine is doing something, it is acting in the world. It's alive! A video of something the machine did in the past is no substitute for its activity in the room now.

In "The Work of Art in the Age of Mechanical Reproduction", Walter Benjamin distinguishes between original art and copies:

Even the most perfect reproduction of a work of art is lacking in one element: its presence in time and space, its unique existence at the place where it happens to be.

Benjamin calls this element, this thing that's lost when art is copied, its "aura." He imagines that the first use of art was in ritual. Back then, art was valuable because it was magic. The animals that Stone Age people painted in caves were instruments of magic, he thinks. A copy of a work of art has no magic power. It is separated from its ritual use, and so its only remaining value is aesthetic. "This permits the audience to take the position of a critic."

So, too, when you show me a video of your demo. I can appreciate your video aesthetically, if it's beautiful. But you don't want me to have critical distance: you want to be a magician. You want to perform the ritual in front of me and entrance me. You press the button, and the magic happens.

This is why people like Bill Gates and Steve Jobs have shown live demos instead of canned ones. They want to be magicians. The risk is great: Windows 98 blue-screened when Bill Gates demonstrated it, and Steve Jobs couldn't get his iPhone 4 online. If you're going to do a live demo you need a better backup plan than they had. And you need to practice like crazy. But the experience of a live demo cannot be matched. The magic only happens when the machine is doing something now, in the room. Don't you want to be a magician?

Motor 0.2.1 Released

Version 0.2.1 of Motor, the asynchronous MongoDB driver for Python and Tornado, has been released. It fixes two bugs: MOTOR-32: The documentation claimed that MotorCursor.close immediately halted execution of MotorCursor.each, [...]

Motor

Version 0.2.1 of Motor, the asynchronous MongoDB driver for Python and Tornado, has been released. It fixes two bugs:

  • MOTOR-32: The documentation claimed that MotorCursor.close immediately halted execution of MotorCursor.each, but it didn't. MotorCursor.each() is now halted correctly.
  • MOTOR-33: An incompletely iterated cursor's __del__ method sometimes got stuck and cost 100% CPU forever, even though the application was still responsive.

The manual is on ReadTheDocs. If you find a bug or want a feature, I exhort you to report it.

PyCon APAC 2014 recap

[Source] Thanks to the miracle of satellite Internet, I'm posting from a plane over the Pacific. My cramped schedule prohibited me from visiting Taipei as long as I'd like: this trip comprised three days in the city and two days on planes. [...]

ShiLin [Source]

Thanks to the miracle of satellite Internet, I'm posting from a plane over the Pacific. My cramped schedule prohibited me from visiting Taipei as long as I'd like: this trip comprised three days in the city and two days on planes. But the exuberant city, and the sincerity of the conference organizers' efforts, made it worthwhile.

I delivered a half-length version of my PyCon talk on async in the morning, to an audience slightly overflowing the room. When I was deciding how to cut the talk, I made the painful choice to cut the code and keep the analogies. And once again the analogies were real winners: lots of laughter when I started talking about sandwiches and pizza.

PyCon APAC was held at Academia Sinica, a research institute. Being in an academic setting gave me two big boosts as a speaker: lecture rooms and young people.

The lecture rooms are actually designed to help the speaker and audience stay connected. In contrast, the giant rooms in convention centers are designed to be usable for anything but good for nothing. (The room I last spoke in, at PyCon in Montréal, would serve best for assembling aircraft.) But the Academia Sinica rooms are purpose-built. As Scott Berkun writes, "the ideal room for a lecture is a theater. It's crazy, I know, but we solved most lecture-room problems about 2,000 years ago. The Greek amphitheater gets it all just about right, provided it doesn't rain." What a friendly feeling, to be surrounded by the audience and to see everyone's faces.

The audience was generally university students or professionals early in their careers. They came ready to learn. Plus, they're excited when Western open source programmers make the trip to meet them: there are fewer open source leaders in Asia (except perhaps Japan) and the area isn't saturated with conferences.

In the afternoon I gave my second talk, a new one I wrote this week. I guess people liked my async talk and came back for seconds: we overflowed the room so badly that latecomers could not wedge themselves through the door. I told a story about how a blogger complained that PyMongo was slow, and what tools I used to prove the blogger wrong. Huge laughs, the most fun I've had speaking.

Jesse pycon apac [Source]

My colleague Amalia Hawkins delivered her "Narrowing the Gender Gap at Hackathons" talk to general acclaim. Hackathons aren't yet the phenomenon in Asia that they are in the US, so there's a chance to start things right. Amalia's thesis is that focusing on the experience of all hackathon newcomers benefits everyone, and narrows the gender gap as a side effect.

Fernando Perez and Wes McKinney gave inspiring keynotes about their numerical Python tools, IPython and Pandas respectively. I'm severely ignorant about numerical Python, so I appreciated learning from experts. Jessica McKellar's keynote invited us to expand Python's reach among groups who don't feel welcome in the open source community.

Besides the conference, all I remember about Taipei is a continuous blur of food. The university cafeteria served weird delicious Chinese vegetables, sour eggplant, seitan, pieces of seaweed tied into bows. All for a dollar.

Amalia and I ate strange things for dinner at night markets. Grilled cuttlefish. Enoki mushrooms wrapped in bacon. One of the food stands made hotdogs, except the bun was replaced with a big sausage, so it was like a meta-hotdog. Amalia had two scoops of taro-root ice cream that had peanut brittle shaved onto them with a carpenter's lathe, and wrapped up into a crêpe like an ice-cream burrito.

We found a food cart in a grimy back alley that made ramen for 50 cents, with the freshest, chewiest noodles I've ever tasted.

PyCon in Taipei!

I got back from Street Retreat on Sunday, and tomorrow I fly to Taipei. Why in the world do I overschedule myself like this? Nevertheless I'm excited to visit Taiwan for the first time and to speak at PyCon APAC. I'll give a shorter version of [...]

Taipei Rushhour birdseye

I got back from Street Retreat on Sunday, and tomorrow I fly to Taipei. Why in the world do I overschedule myself like this?

Nevertheless I'm excited to visit Taiwan for the first time and to speak at PyCon APAC. I'll give a shorter version of the talk I gave at PyCon in Montreal: "What Is Async, How Does It Work, And When Should I Use It?"

I'll also give a new talk on "Python Profiling: The Guts and The Glory." This isn't your regular old Python profiling talk. The regular old talk shows you cProfile, admits that its output is unreadable, and wishes you the best of luck. My talk will tell a story of drama and intrigue, introduce you to a powerful Python profiler called Yappi, show you how to visualize its output with KCacheGrind, and even delve into how CPython profilers actually work.

Motor 0.2 Released

Version 0.2 of Motor, the asynchronous MongoDB driver for Python and Tornado, has been released. This release is compatible with MongoDB 2.6 and PyMongo 2.7. It dramatically improves interoperability with Tornado coroutines, [...]

Motor

Version 0.2 of Motor, the asynchronous MongoDB driver for Python and Tornado, has been released.

This release is compatible with MongoDB 2.6 and PyMongo 2.7. It dramatically improves interoperability with Tornado coroutines, includes support for non-blocking DNS, and adds numerous smaller features.

Links:

If you encounter any issues, please file them in our bug tracker.

That's it! With the Motor release behind me, I'm looking forward to enjoying PyCon, and talking about async at 3:15pm tomorrow.

Announcing Motor 0.2 release candidate

I'm excited to offer you Motor 0.2, release candidate zero. Motor is my non-blocking driver for MongoDB and Tornado. The changes from Motor 0.1 to 0.2 are epochal. They were motivated primarily by three events: Motor wraps PyMongo, and [...]

Motor

I'm excited to offer you Motor 0.2, release candidate zero. Motor is my non-blocking driver for MongoDB and Tornado.

The changes from Motor 0.1 to 0.2 are epochal. They were motivated primarily by three events:

  • Motor wraps PyMongo, and PyMongo has improved substantially.
  • MongoDB 2.6 is nearly done, and Motor has added features to support it.
  • Tornado's support for coroutines and for non-blocking DNS has improved, and Motor 0.2 takes advantage of this.

Please read the changelog before upgrading. There are backwards-breaking API changes; you must update your code. I tried to make the instructions clear and the immediate effort small. A summary of the changes is in my post, "the road to 0.2".

Once you're done reading, upgrade:

pip install pymongo==2.7
pip install https://github.com/mongodb/motor/archive/0.2rc0.zip

The owner's manual is on ReadTheDocs. At the time of this writing, Motor 0.2's docs are in the "latest" branch:

http://motor.readthedocs.org/en/latest/

...and Motor 0.1's docs are in "stable":

http://motor.readthedocs.org/en/stable/

Enjoy! If you find a bug or want a feature, report it. If I don't hear of any bugs in the next week I'll make the release official.

In any case, tweet me if you're building something nifty with Motor. I want to hear from you.

PyMongo 2.7 Has Shipped

Source: inrideo on Flickr I announce with satisfaction that we've released PyMongo 2.7, the successor to PyMongo 2.6.3. The bulk of the driver's changes are to support MongoDB 2.6, which is currently a release candidate. The newest [...]

Amethystine scrub python

Source: inrideo on Flickr

I announce with satisfaction that we've released PyMongo 2.7, the successor to PyMongo 2.6.3. The bulk of the driver's changes are to support MongoDB 2.6, which is currently a release candidate. The newest MongoDB has an enhanced wire protocol and some big new features, so PyMongo 2.7 is focused on supporting it. However, the driver still supports server versions as old as 1.8.

Read my prior post for a full list of the features and improvements in PyMongo. Since I wrote that, we've fixed some compatibility issues with MongoDB 2.6, dealt with recent changes to the nose and setuptools packages, and made a couple memory optimizations.

Motor 0.2 is about to ship, as well. I'll give the details in my next post.

What's next for PyMongo? We now embark on a partial rewrite, which will become PyMongo 3.0. The next-generation driver will delete many deprecated APIs: safe will disappear, since it was deprecated in favor of w=1 years ago. Connection will walk off into the sunset, giving way to MongoClient. We'll make a faster and more thread-safe core for PyMongo, and we'll expose a clean API so Motor and ODMs can wrap PyMongo more neatly.

We'll discard PyMongo's current C extension for BSON-handling. We'll replace it with libbson, a common codec that our C team is building. If you're handling BSON in PyPy, we aim to give you a much faster pure-Python codec there, too.

An Enlightening Failure

This year I plan to rewrite PyMongo's BSON decoder. The decoder is written in C, and it's pretty fast, but I had a radical idea for how to make it faster. That idea turned out to be wrong, although it took me a long time to discover that. [...]

Facepalm

This year I plan to rewrite PyMongo's BSON decoder. The decoder is written in C, and it's pretty fast, but I had a radical idea for how to make it faster. That idea turned out to be wrong, although it took me a long time to discover that.

Discovering I'm wrong is the best way to learn. The second-best way is by writing. So I'll multiply the two by writing a story about my wrong idea.

The Story

Currently, when PyMongo decodes a buffer of BSON documents, it creates a Python dict (hashtable) for each BSON document. It returns the dicts in a list.

My radical idea was to make a maximally-lazy decoder. I wouldn't decode all the documents at once, I would decode each document just-in-time as you iterate. Even more radically, I wouldn't convert each document into a dict. Instead, each document would only know its offset in the BSON buffer. When you access a field in the document, like this:

document["fieldname"]

...I wouldn't do a hashtable lookup anymore. I'd do a linear-search through the BSON. I thought this approach might be faster, since the linear search would usually be fast, and I'd avoid the overhead of creating the hashtable. If a document was frequently accessed or had many fields, I'd eventually "inflate" it into a dict.

I coded up a prototype in C, benchmarked it, and it was eight times faster than the current code. I rejoiced, and began to develop it into a full-featured decoder.

At some point I applied our unicode tests to my decoder, and I realized I was using PyString_FromString to decode strings, when I should be using PyUnicode_DecodeUTF8. (I was targeting only Python 2 at this point.) I added the call to PyUnicode_DecodeUTF8, and my decoder started passing our unicode tests. I continued adding features.

Then next day I benchmarked again, and my code was no longer any faster than the current decoder. I didn't know which change had caused the slowdown, so I learned how to use callgrind and tried all sorts of things and went a little crazy. Eventually I used git bisect, and I was enlightened: my prototype had only been fast as long as it didn't decode UTF-8 properly. Once I had fixed that, I had the same speed as the current PyMongo.

Lessons Learned

  1. The cost of PyMongo's BSON decoding is typically dominated by UTF-8 decoding. There's no way to avoid it, and it's already optimized like crazy.
  2. Python's dict is really fast for PyMongo's kind of workload. It's not worth trying to beat it.
  3. When I care about speed, I need to run my benchmarks on each commit. I should use git bisect as the first resort, not the last.

This is disappointing, but I've learned a ton about the Python C API, BSON, and callgrind. On my next attempt to rewrite the decoder, I won't forget my hard-won lessons.

Testing Network Errors With MongoDB

Someone asked on Twitter today for a way to trigger a connection failure between MongoDB and the client. This would be terribly useful when you're testing your application's handling of network hiccups. You have options: you could use [...]

Someone asked on Twitter today for a way to trigger a connection failure between MongoDB and the client. This would be terribly useful when you're testing your application's handling of network hiccups.

You have options: you could use mongobridge to proxy between the client and the server, and at just the right moment, kill mongobridge.

Or you could use packet-filtering tools to accomplish the same: iptables on Linux and ipfw or pfctl on Mac and BSD. You could use one of these tools to block MongoDB's port at the proper moment, and unblock it afterward.

There's yet another option, not widely known, that you might find simpler: use a MongoDB "failpoint" to break your connection.

Failpoints are our internal mechanism for triggering faults in MongoDB so we can test their consequences. Read about them on Kristina's blog. They're not meant for public consumption, so you didn't hear about it from me.

The first step is to start MongoDB with the special command-line argument:

mongod --setParameter enableTestCommands=1

Next, log in with the mongo shell and tell the server to abort the next two network operations:

> db.adminCommand({
...   configureFailPoint: 'throwSockExcep',
...   mode: {times: 2}
... })
2014-03-20T20:31:42.162-0400 trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed

The server obeys you instantly, before it even replies, so the command itself appears to fail. But fear not: you've simply seen the first of the two network errors you asked for. You can trigger the next error with any operation:

> db.collection.count()
2014-03-20T20:31:48.485-0400 trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed

The third operation succeeds:

> db.collection.count()
2014-03-20T21:07:38.742-0400 trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed
2014-03-20T21:07:38.742-0400 reconnect 127.0.0.1:27017 (127.0.0.1) ok
1

There's a final "failed" message that I don't understand, but the shell reconnects and the command returns the answer, "1".

You could use this failpoint when testing a driver or an application. If you don't know exactly how many operations you need to break, you could set times to 50 and, at the end of your test, continue attempting to reconnect until you succeed.

Ugly, perhaps, but if you want a simple way to cause a network error this could be a reasonable approach.