If you use Gevent, you know it’s great for concurrency, but alas, none of the Python performance profilers work on Gevent applications. So I’m taking matters into my own hands. I’ll show you how both cProfile and Yappi stumble on programs that use greenlets, and I’ll demonstrate GreenletProfiler, my solution.
cProfile Gets Confused by Greenlets
I’ll write a script that spawns two greenlets, then I’ll profile the script to look for the functions that cost the most. In my script, the
foo greenlet spins 20 million times. Every million iterations, it yields to Gevent’s scheduler (the “hub”). The
bar greenlet does the same, but it spins only half as many times.
import cProfile import gevent import lsprofcalltree MILLION = 1000 1000 def foo(): for i in range(20 MILLION): if not i % MILLION: # Yield to the Gevent hub. gevent.sleep(0) def bar(): for i in range(10 * MILLION): if not i % MILLION: gevent.sleep(0) profile = cProfile.Profile() profile.enable() foo_greenlet = gevent.spawn(foo) bar_greenlet = gevent.spawn(bar) foo_greenlet.join() bar_greenlet.join() profile.disable() stats = lsprofcalltree.KCacheGrind(profile) stats.output(open('cProfile.callgrind', 'w'))
Let’s pretend I’m a total idiot and I don’t know why this program is slow. I profile it with cProfile, and convert its output with lsprofcalltree so I can view the profile in KCacheGrind. cProfile is evidently confused: it thinks
bar took twice as long as
foo, although the opposite is true:
cProfile also fails to count the calls to
sleep. I’m not sure why cProfile’s befuddlement manifests this particular way. If you understand it, please explain it to me in the comments. But it’s not surprising that cProfile doesn’t understand my script: cProfile is built to trace a single thread, so it assumes that if one function is called, and then a second function is called, that the first must have called the second. Greenlets defeat this assumption because the call stack can change entirely between one function call and the next.
Yappi Stumbles Over Greenlets
Next let’s try Yappi, the excellent profiling package by Sumer Cip. Yappi has two big advantages over cProfile: it’s built to trace multithreaded programs, and it can measure CPU time instead of wall-clock time. So maybe Yappi will do better than cProfile on my script? I run Yappi like so:
yappi.set_clock_type('cpu') yappi.start(builtins=True) foo_greenlet = gevent.spawn(foo) bar_greenlet = gevent.spawn(bar) foo_greenlet.join() bar_greenlet.join() yappi.stop() stats = yappi.get_func_stats() stats.save('yappi.callgrind', type='callgrind')
Yappi thinks that when
gevent.sleep, they indirectly call
Greenlet.run, and eventually call themselves:
This is true in some philosophical sense. When my greenlets sleep, they indirectly cause each other to be scheduled by the Gevent hub. But it’s wrong to say they actually call themselves recursively, and it confuses Yappi’s cost measurements: Yappi attributes most of the CPU cost of the program to Gevent’s internal
Waiter.get function. Yappi also, for some reason, thinks that
sleep is called only once each by
bar, though it knows it was called 30 times in total.
GreenletProfiler Groks Greenlets
Since Yappi is so great for multithreaded programs, I used it as my starting point for GreenletProfiler. Yappi’s core tracing code is in C, for speed. The C code has a notion of a “context” which is associated with each thread. I added a hook to Yappi that lets me associate contexts with greenlets instead of threads. And voilà, the profiler understands my script!
bar are correctly measured as two-thirds and one-third of the script’s total cost:
Unlike Yappi, GreenletProfiler also knows that
sleep 20 times and
sleep 10 times:
Finally, I know which functions to optimize because I have an accurate view of how my script executes.
I can’t take much credit for GreenletProfiler, because I stand on the shoulders of giants. Specifically I am standing on the shoulders of Sumer Cip, Yappi’s author. But I hope it’s useful to you. Install it with
pip install GreenletProfiler, profile your greenletted program, and let me know how GreenletProfiler works for you.