Python's += Is Weird

Image: William Warby on Flickr
Here's a Python gotcha I've hit often enough to merit a blog post: x += 1 is weird in Python. It's compiled roughly like x = x + 1, with two surprising consequences. One is this familiar pitfall:
>>> x = 0
>>> def f():
... x += 1
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
UnboundLocalError: local variable 'x' referenced before assignmentThe compiler thinks of x += 1 similarly to x = x + 1, so it considers x to be a local variable bound in the scope of f. But x is referenced before it's assigned to. Let's look at the bytecode:
>>> dis.dis(f)
2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_FAST 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUEThe first opcode, LOAD_FAST, fails to load x because it's not in scope. Obviously, we need to declare global x:
>>> def f():
... global x
... x += 1
...
>>> dis.dis(f)
3 0 LOAD_GLOBAL 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_GLOBAL 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUENow LOAD_FAST is replaced with LOAD_GLOBAL, which correctly locates x.
The second pitfall of += is lost updates. If we run f ten thousand times in parallel, sometimes x is incremented less than ten thousand times:
>>> def go():
... global x
... x = 0
...
... def f():
... global x
... x += 1
...
... ts = [threading.Thread(target=f)
... for _ in range(10000)]
...
... for t in ts:
... t.start()
...
... for t in ts:
... t.join()
...
... print x
...
>>> go()
10000
>>> go()
10000
>>> go()
9998Again, the problem is clear if we look at the bytecode. f is compiled as a series of opcodes that load the global integer referenced by x onto the stack, add 1 to it, and store the new integer back into x:
>>> dis.dis(f)
3 0 LOAD_GLOBAL 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_GLOBAL 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUEThe interpreter can switch threads anywhere between LOAD_GLOBAL, which loads the global value of x onto this thread's stack frame, and STORE_GLOBAL, which saves it back to the global x.
Say x is 17 and two threads execute f. Thread A loads the integer 17 onto its stack, adds one to it, and gets interrupted. Now Thread B also loads 17 onto its stack and adds one. No matter the order the threads now complete, the final value of x will be 18, although we expect 19.
The solution is to protect += statements with a Lock.