Python's += Is Weird
Image: William Warby on Flickr
Here's a Python gotcha I've hit often enough to merit a blog post: x += 1
is weird in Python. It's compiled roughly like x = x + 1
, with two surprising consequences. One is this familiar pitfall:
>>> x = 0
>>> def f():
... x += 1
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
UnboundLocalError: local variable 'x' referenced before assignment
The compiler thinks of x += 1
similarly to x = x + 1
, so it considers x
to be a local variable bound in the scope of f
. But x
is referenced before it's assigned to. Let's look at the bytecode:
>>> dis.dis(f)
2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_FAST 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
The first opcode, LOAD_FAST
, fails to load x
because it's not in scope. Obviously, we need to declare global x
:
>>> def f():
... global x
... x += 1
...
>>> dis.dis(f)
3 0 LOAD_GLOBAL 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_GLOBAL 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
Now LOAD_FAST
is replaced with LOAD_GLOBAL
, which correctly locates x
.
The second pitfall of +=
is lost updates. If we run f
ten thousand times in parallel, sometimes x
is incremented less than ten thousand times:
>>> def go():
... global x
... x = 0
...
... def f():
... global x
... x += 1
...
... ts = [threading.Thread(target=f)
... for _ in range(10000)]
...
... for t in ts:
... t.start()
...
... for t in ts:
... t.join()
...
... print x
...
>>> go()
10000
>>> go()
10000
>>> go()
9998
Again, the problem is clear if we look at the bytecode. f
is compiled as a series of opcodes that load the global integer referenced by x
onto the stack, add 1 to it, and store the new integer back into x
:
>>> dis.dis(f)
3 0 LOAD_GLOBAL 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_GLOBAL 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
The interpreter can switch threads anywhere between LOAD_GLOBAL
, which loads the global value of x
onto this thread's stack frame, and STORE_GLOBAL
, which saves it back to the global x
.
Say x
is 17 and two threads execute f
. Thread A loads the integer 17 onto its stack, adds one to it, and gets interrupted. Now Thread B also loads 17 onto its stack and adds one. No matter the order the threads now complete, the final value of x
will be 18, although we expect 19.
The solution is to protect +=
statements with a Lock
.