The global interpreter lock (GIL)
今天找到了最新出版的>,首先阅读了其中的The global interpreter lock (GIL)部分,基本解释了自己原来的疑问,觉得非常好,现摘录如下:
Finally, as we'll learn in more detail later in this section, Python's implementation of threads means that only one thread is ever running in the Python virtual machine at any point in time. Python threads are true operating system threads, but all threads must acquire a single shared lock when they are ready to run, and each thread may be swapped out after running for a set number of virtual machine instructions.
Because of this structure, Python threads cannot today be distributed across multiple CPUs on a multi-CPU computer. To leverage more than one CPU, you'll simply need to use process forking, not threads (the amount and complexity of code required for both are roughly the same). Moreover, long-running tasks implemented as C extensions can run truly independently if they release the GIL to allow Python threads to run while their task is in progress. Python code, however, cannot truly overlap in time.
The advantage of Python's implementation of threads is performancewhen it was attempted, making the virtual machine truly thread safe reportedly slowed all programs by a factor of two on Windows and by an even larger factor on Linux. Even nonthreaded programs ran at half speed.
同时,还提到了以下需要注意的几点:
1.GIL的实现中(Python2.5版本),不同线程的切换是基于bytecode的,因此,某些看似不是线程安全的操作其实是安全的:
//L, L1, and L2 are lists; D, D1, and D2 are dictionaries; x and y are objects; and i and j are integers
L.append(x)
L1.extend(L2)
x = L
x = L.pop( )
L1[i:j] = L2
L.sort( )
x = y
x.field = y
D[x] = y
D1.update(D2)
D.keys( )
2.C++ extension module可以绕过GIL的限制。但估计实现起来很麻烦。
3.可以调整GIL的 thread switch interval:
This interval defaults to 100, the number of bytecode instructions before a switch. It does not need to be reset for most programs, but it can be used to tune thread performance. Setting higher values means switches happen less often: threads incur less overhead but they are less responsive to events. Setting lower values makes threads more responsive to events but increases thread switch overhead.
但总体而言,不能充分利用现在非常普遍的multi-CPU结构,这对Python不能不说是一个遗憾。不过自己有个疑问:Java也同样是基于virtual machine的,为何能实现真正的多线程?Python的实现者为何不能进行借鉴呢?