Python's += Is Weird
Image: William Warby on Flickr
Here's a Python gotcha I've hit often enough to merit a blog post: x += 1
is weird in Python. It's compiled roughly like x = x + 1
, with two surprising consequences. One is this familiar pitfall:
>>> x = 0 >>> def f(): ... x += 1 ... >>> f() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in f UnboundLocalError: local variable 'x' referenced before assignment
The compiler thinks of x += 1
similarly to x = x + 1
, so it considers x
to be a local variable bound in the scope of f
. But x
is referenced before it's assigned to. Let's look at the bytecode:
>>> dis.dis(f) 2 0 LOAD_FAST 0 (x) 3 LOAD_CONST 1 (1) 6 INPLACE_ADD 7 STORE_FAST 0 (x) 10 LOAD_CONST 0 (None) 13 RETURN_VALUE
The first opcode, LOAD_FAST
, fails to load x
because it's not in scope. Obviously, we need to declare global x
:
>>> def f(): ... global x ... x += 1 ... >>> dis.dis(f) 3 0 LOAD_GLOBAL 0 (x) 3 LOAD_CONST 1 (1) 6 INPLACE_ADD 7 STORE_GLOBAL 0 (x) 10 LOAD_CONST 0 (None) 13 RETURN_VALUE
Now LOAD_FAST
is replaced with LOAD_GLOBAL
, which correctly locates x
.
The second pitfall of +=
is lost updates. If we run f
ten thousand times in parallel, sometimes x
is incremented less than ten thousand times:
>>> def go(): ... global x ... x = 0 ... ... def f(): ... global x ... x += 1 ... ... ts = [threading.Thread(target=f) ... for _ in range(10000)] ... ... for t in ts: ... t.start() ... ... for t in ts: ... t.join() ... ... print x ... >>> go() 10000 >>> go() 10000 >>> go() 9998
Again, the problem is clear if we look at the bytecode. f
is compiled as a series of opcodes that load the global integer referenced by x
onto the stack, add 1 to it, and store the new integer back into x
:
>>> dis.dis(f) 3 0 LOAD_GLOBAL 0 (x) 3 LOAD_CONST 1 (1) 6 INPLACE_ADD 7 STORE_GLOBAL 0 (x) 10 LOAD_CONST 0 (None) 13 RETURN_VALUE
The interpreter can switch threads anywhere between LOAD_GLOBAL
, which loads the global value of x
onto this thread's stack frame, and STORE_GLOBAL
, which saves it back to the global x
.
Say x
is 17 and two threads execute f
. Thread A loads the integer 17 onto its stack, adds one to it, and gets interrupted. Now Thread B also loads 17 onto its stack and adds one. No matter the order the threads now complete, the final value of x
will be 18, although we expect 19.
The solution is to protect +=
statements with a Lock
.