Python线程状态和全局解释器锁

Thread State and the Global Interpreter Lock
线程状态和全局解释器锁
The
Python interpreter is not fully thread safe. In order to support
multi-threaded Python programs, there's a global lock that must be held
by the current thread before it can safely access Python objects.
Without the lock, even the simplest operations could cause problems in
a multi-threaded program: for example, when two threads simultaneously
increment the reference count of the same object, the reference count
could end up being incremented only once instead of twice.
Python 解释器不是完全线程安全的。当前线程想要安全访问 Python 对象的前提是获取用以支持多线程安全的全局锁。没有这个锁,甚至多线程程序中最简单的操作都会发生问题。例如,两个线程同时增加一个对象的引用计数,该引用计数可能只增加了一次而非两次。
Therefore,
the rule exists that only the thread that has acquired the global
interpreter lock may operate on Python objects or call Python/C API
functions. In order to support multi-threaded Python programs, the
interpreter regularly releases and reacquires the lock -- by default,
every 100 bytecode instructions (this can be changed with
sys.setcheckinterval()). The lock is also released and reacquired
around potentially blocking I/O operations like reading or writing a
file, so that other threads can run while the thread that requests the
I/O is waiting for the I/O operation to complete.

此,存在一个规则:只有获得了全局解释器锁的线程才能操作 Python 对象或者调用 Python/C API 函数。为了支持多线程
Python
编程,解释器有规律的释放和回收锁——默认情况下,每100字节指令集循环一次(可以通过sys.setcheckinterval()设置)。类似文件
读写之类的 i/o 片也会随锁释放和回收,这样其它的线程在请求 I/O 操作的线程等待I/O操作完成的时候也可以运行。
The
Python interpreter needs to keep some bookkeeping information separate
per thread -- for this it uses a data structure called
PyThreadState
. There's one global variable, however: the pointer to the current
PyThreadState
structure. While most thread packages have a way to store "per-thread
global data", Python's internal platform independent thread abstraction
doesn't support this yet. Therefore, the current thread state must be
manipulated explicitly.
Python解释器需要为每个独立的线程保留一些薄记信息——为此它使用一个称为
PyThreadState
的数据结构。然而,这是一个全局变量:当前
PyThreadState
结构的指针。尽管大多数线程包都有办法保存“每线程全局数据”,Python 的内置平台无关线程指令还不支持它。因此,必须明确操作当前的线程状态。
This is easy enough in most cases. Most code manipulating the global interpreter lock has the following simple structure:
大多数境况下这都是很简单的。全局解释器锁的操作代码主要是以下结构:
Toggle line numbers
Toggle line numbers
   1 Save the thread state in a local variable.   2 Release the interpreter lock.   3 ...Do some blocking I/O operation...   4 Reacquire the interpreter lock.   5 Restore the thread state from the local variable.This is so common that a pair of macros exists to simplify it:
这种方式如此通用,我们可以用一对现成的宏来简化它:
Toggle line numbers
Toggle line numbers
   1 Py_BEGIN_ALLOW_THREADS   2 ...Do some blocking I/O operation...   3 Py_END_ALLOW_THREADSThe
Py_BEGIN_ALLOW_THREADS macro opens a new block and declares a hidden
local variable; the Py_END_ALLOW_THREADS macro closes the block.
Another advantage of using these two macros is that when Python is
compiled without thread support, they are defined empty, thus saving
the thread state and lock manipulations.
Py_BEGIN_ALLOW_THREADS
宏打开一个新的 block 并且定义一个隐藏的局部变量;Py_END_ALLOW_THREADS 宏关闭这个 block
。这两个宏还有一个高级的用途:如果 Python 编译为不支持线程的版本,他们定义为空,因此保存线程状态并锁定操作。
When thread support is enabled, the block above expands to the following code:
如果支持线程,这个 block 就会展开为以下代码:
Toggle line numbers
Toggle line numbers
   1     PyThreadState *_save;   2    3     _save = PyEval_SaveThread();   4     ...Do some blocking I/O operation...   5     PyEval_RestoreThread(_save);Using even lower level primitives, we can get roughly the same effect as follows:
使用更低级的元素,我们可以获得同样的效果:
Toggle line numbers
Toggle line numbers
   1     PyThreadState *_save;   2    3     _save = PyThreadState_Swap(NULL);   4     PyEval_ReleaseLock();   5     ...Do some blocking I/O operation...   6     PyEval_AcquireLock();   7     PyThreadState_Swap(_save);There are some subtle differences; in particular,
PyEval
_
RestoreThread
()
saves and restores the value of the global variable errno, since the
lock manipulation does not guarantee that errno is left alone. Also,
when thread support is disabled,
PyEval
_
SaveThread
() and
PyEval
_
RestoreThread
() don't manipulate the lock; in this case,
PyEval
_
ReleaseLock
() and
PyEval
_
AcquireLock
()
are not available. This is done so that dynamically loaded extensions
compiled with thread support enabled can be loaded by an interpreter
that was compiled with disabled thread support.
这里有些微妙的不同,细节上,因为锁操作不保证全局变量 erron 的一致,
PyEval
_
RestoreThread
() 保存和恢复 errno。同样,不支持线程时,
PyEval
_
SaveThread
() 和
PyEval
_
RestoreThread
() 不操作锁,在这种情况下
PyEval
_
ReleaseLock
() 和
PyEval
_
AcquireLock
() 不可用。这使得不支持线程的解释器可以动态加载支持线程的扩展。
The
global interpreter lock is used to protect the pointer to the current
thread state. When releasing the lock and saving the thread state, the
current thread state pointer must be retrieved before the lock is
released (since another thread could immediately acquire the lock and
store its own thread state in the global variable). Conversely, when
acquiring the lock and restoring the thread state, the lock must be
acquired before storing the thread state pointer.
全局解释器锁用于保护当前线程状态的指针。当事方锁并保存状态的时候,当前线程状态指针必须在锁释放之前回收(因为另一个指针将会随之获取锁并且在全局变量中保存它自己的线程状态)。相反,获取锁并恢复线程状态的时候,锁必须在保存状态指针之前就获得。
Why
am I going on with so much detail about this? Because when threads are
created from C, they don't have the global interpreter lock, nor is
there a thread state data structure for them. Such threads must
bootstrap themselves into existence, by first creating a thread state
data structure, then acquiring the lock, and finally storing their
thread state pointer, before they can start using the Python/C API.
When they are done, they should reset the thread state pointer, release
the lock, and finally free their thread state data structure.

什么我要对这些进行详细介绍?因为从 C 中创建线程的时候,它们没有全局解释器锁,也没有对应的线程状态数据结构。这些线程在他们使用
Python/C API
之前必须自举,首先要创建线程状态数据结构,然后获取锁,最后保存它们的线程状态指针。完成工作之后,他们可以重置线程状态指针,释放锁,最后释放他们的
线程数据结构。
Beginning
with version 2.3, threads can now take advantage of the PyGILState_*()
functions to do all of the above automatically. The typical idiom for
calling into Python from a C thread is now:
自2.3版开始,线程可以使用 PyGILState_*()函数方便的自动获取以上的所有功能。从C线程中进入 Python 调用的典型方法现在变成:
Toggle line numbers
Toggle line numbers
   1     PyGILState_STATE gstate;   2     gstate = PyGILState_Ensure();   3    4     /* Perform Python actions here.  */   5     result = CallSomeFunction();   6     /* evaluate result */   7    8     /* Release the thread. No Python API allowed beyond this point. */   9     PyGILState_Release(gstate);Note
that the PyGILState_*() functions assume there is only one global
interpreter (created automatically by Py_Initialize()). Python still
supports the creation of additional interpreters (using Py_
NewInterpreter
()), but mixing multiple interpreters and the PyGILState_*() API is unsupported.
注意 PyGILState_*() 函数假定只有一个全局解释器(由 Py_Initialize() 自动创建)。Python 还支持创建附加的解释器(通过 Py_
NewInterpreter
()),但是 PyGILState_*() 不支持混合多解释器。

1. PyInterpreterStateThis
data structure represents the state shared by a number of cooperating
threads. Threads belonging to the same interpreter share their module
administration and a few other internal items. There are no public
members in this structure.
这个数据结构描述几个协作线程共享的状态。属于同一个解释器的线程共享它们的模块维护和几个其它的内部子项。这个结构没有公开成员。
Threads
belonging to different interpreters initially share nothing, except
process state like available memory, open file descriptors and such.
The global interpreter lock is also shared by all threads, regardless
of to which interpreter they belong.
属于不同解释器的线程除了可用内存、打开的文件描述符之类的进程状态不共享任何东西。全局解释器锁也由所有线程共享,与它们所属的解释器无关。

2. PyThreadStateThis data structure represents the state of a single thread. The only public data member is
PyInterpreterState
*interp, which points to this thread's interpreter state.
这个数据结构描述了单个线程的状态。唯一的数据成员是
PyInterpreterState
*interp,这个线程的解释器状态。

3. void PyEval_InitThreads( )Initialize
and acquire the global interpreter lock. It should be called in the
main thread before creating a second thread or engaging in any other
thread operations such as
PyEval
_
ReleaseLock
() or
PyEval
_
ReleaseThread
(tstate). It is not needed before calling
PyEval
_
SaveThread
() or
PyEval
_
RestoreThread
().
初始化和获取全局解释器锁。它应该在主线程中创建,并且应该在第二个线程创建或者类似
PyEval
_
ReleaseLock
() 或
PyEval
_
ReleaseThread
(tstate) 之类的线程操作之前。它不需要在
PyEval
_
SaveThread
() 或
PyEval
_
RestoreThread
()之前调用。
This is a no-op when called for a second time. It is safe to call this function before calling Py_Initialize().
第二次调用的话这就是一个否定操作。它可以在 Py_Initialize() 被调用之前安全调用。
When
only the main thread exists, no lock operations are needed. This is a
common situation (most Python programs do not use threads), and the
lock operations slow the interpreter down a bit. Therefore, the lock is
not created initially. This situation is equivalent to having acquired
the lock: when there is only a single thread, all object accesses are
safe. Therefore, when this function initializes the lock, it also
acquires it. Before the Python thread module creates a new thread,
knowing that either it has the lock or the lock hasn't been created
yet, it calls
PyEval
_
InitThreads
(). When this call returns, it is guaranteed that the lock has been created and that the calling thread has acquired it.

有一个主线程的时候,不需要锁操作。这是常见的情景(大多数 Python
程序员不用线程),锁操作稍微拖慢了解释器。因此,锁没有从一开始就创建。这种情况等同于已经获取了锁:只有一个线程的时候,所有的对象访问都是安全的。
因此,当该函数初始化锁,它也可以获得锁。Python 线程模块创建一个新的线程之前,它调用
PyEval
_
InitThreads
(),了解有锁或者还没有创建锁。当这个调用返回时,它确保锁以被创建,并且调用的线程已经得到它。
It is not safe to call this function when it is unknown which thread (if any) currently has the global interpreter lock.
当前拥有全局解释器锁的线程(或其它什么)未知时,调用这个函数不安全。
This function is not available when thread support is disabled at compile time.
编译时如果不支持线程,这个函数不可用。

4. int PyEval_ThreadsInitialized( )Returns a non-zero value if
PyEval
_
InitThreads
()
has been called. This function can be called without holding the lock,
and therefore can be used to avoid calls to the locking API when
running single-threaded. This function is not available when thread
support is disabled at compile time. New in version 2.4.
如果
PyEval
_
InitThreads
() 已经被调用,这个函数返回非0值。因为单线程的时候可以不调用锁 API,这个函数可以在没有获得锁的情况下使用。这个函数在编译时禁用线程支持的情况下不可用。2.4版新加入。

5. void PyEval_AcquireLock( )Acquire
the global interpreter lock. The lock must have been created earlier.
If this thread already has the lock, a deadlock ensues. This function
is not available when thread support is disabled at compile time.
获取全局解释器锁。锁必须提前创建。如果线程已经得到锁,会发生死锁。这个函数在编译时禁用线程支持的情况下不可用。

6. void PyEval_ReleaseLock( )Release
the global interpreter lock. The lock must have been created earlier.
This function is not available when thread support is disabled at
compile time.
释放全局解释器锁。锁必须提前创建。这个函数在编译时禁用线程支持的情况下不可用。

7. void PyEval_AcquireThread( PyThreadState *tstate)Acquire
the global interpreter lock and set the current thread state to tstate,
which should not be NULL. The lock must have been created earlier. If
this thread already has the lock, deadlock ensues. This function is not
available when thread support is disabled at compile time.
获得全局解释器锁并将当前线程状态设定为 tstate ,它不能为NULL。锁必须提前创建。如果线程已经拥有锁,会发生死锁。这个函数在编译时禁用线程支持的情况下不可用。

8. void PyEval_ReleaseThread( PyThreadState *tstate)Reset
the current thread state to NULL and release the global interpreter
lock. The lock must have been created earlier and must be held by the
current thread. The tstate argument, which must not be NULL, is only
used to check that it represents the current thread state -- if it
isn't, a fatal error is reported. This function is not available when
thread support is disabled at compile time.
重置当前线程状态为NULL并释放全局解释器锁。锁必须提前创建并且以在当前线程中获得。参数 tstate 不能为 NULL。它只能用于校验它描述的当前线程状态——如果它不对,会报告一个致命错误。这个函数在编译时禁用线程支持的情况下不可用。

9. PyThreadState* PyEval_SaveThread( )Release
the interpreter lock (if it has been created and thread support is
enabled) and reset the thread state to NULL, returning the previous
thread state (which is not NULL). If the lock has been created, the
current thread must have acquired it. (This function is available even
when thread support is disabled at compile time.)
释放解释器锁(如果它已经被创建而且定义了线程支持)并且将线程状态设为 NULL ,返回前一个线程状态(如果它不为 NULL )。如果锁已经创建,当前线程必须获取它。(这个函数甚至在编译时不支持线程的情况下也能使用)。

10. void PyEval_RestoreThread( PyThreadState *tstate)Acquire
the interpreter lock (if it has been created and thread support is
enabled) and set the thread state to tstate, which must not be NULL. If
the lock has been created, the current thread must not have acquired
it, otherwise deadlock ensues. (This function is available even when
thread support is disabled at compile time.)
获取解释器锁(如果支持线程并且锁已经创建)并设置线程状态为非空的 tstate。如果锁已经创建,当前线程必须没有在之前获得它,不然会发生死锁。(这个函数甚至在编译时不支持线程的情况下也能使用)。
The following macros are normally used without a trailing semicolon; look for example usage in the Python source distribution.
以下的宏通常调用的时候不以分号结尾;可以在发布的 Python 源代码中找到使用的示例。

11. Py_BEGIN_ALLOW_THREADSThis macro expands to "{
PyThreadState
*_save; _save =
PyEval
_
SaveThread
();".
Note that it contains an opening brace; it must be matched with a
following Py_END_ALLOW_THREADS macro. See above for further discussion
of this macro. It is a no-op when thread support is disabled at compile
time.
这个宏展开为 "{
PyThreadState
*_save; _save =
PyEval
_
SaveThread
();" 。注意它包含一个左大括号;它必须在其后匹配 Py_END_ALLOW_THREADS 宏。这个宏的介绍参见后面。当线程支持在编译时被禁用时它是一个 no-op。

12. Py_END_ALLOW_THREADSThis macro expands to "
PyEval
_
RestoreThread
(_save);
}". Note that it contains a closing brace; it must be matched with an
earlier Py_BEGIN_ALLOW_THREADS macro. See above for further discussion
of this macro. It is a no-op when thread support is disabled at compile
time.
这个宏展开为 "
PyEval
_
RestoreThread
(_save); }" 。注意它包含一个右大括号;它必须在之前匹配一个 Py_BEGIN_ALLOW_THREADS 宏。这个宏的介绍参见前面。当线程支持在编译时被禁用时它是一个 no-op。

13. Py_BLOCK_THREADSThis macro expands to "
PyEval
_
RestoreThread
(_save);":
it is equivalent to Py_END_ALLOW_THREADS without the closing brace. It
is a no-op when thread support is disabled at compile time.
这个宏展开为 "
PyEval
_
RestoreThread
(_save);" ;它等同于 Py_END_ALLOW_THREADS 去掉右大括号。当线程支持在编译时被禁用时它是一个 no-op。

14. Py_UNBLOCK_THREADSThis macro expands to "_save =
PyEval
_
SaveThread
();":
it is equivalent to Py_BEGIN_ALLOW_THREADS without the opening brace
and variable declaration. It is a no-op when thread support is disabled
at compile time.
这个宏展开为 "_save =
PyEval
_
SaveThread
();" ;它是等同于 Py_BEGIN_ALLOW_THREADS 去掉左大括号和变量声明。当线程支持在编译时被禁用时它是一个 no-op。
All
of the following functions are only available when thread support is
enabled at compile time, and must be called only when the interpreter
lock has been created.
以下所有函数只能在编译时确认支持线程的情况下可用,并且必须在解释器锁创建后被调用。

15. PyInterpreterState* PyInterpreterState_New( )Create
a new interpreter state object. The interpreter lock need not be held,
but may be held if it is necessary to serialize calls to this function.
创建一个新解释器状态对象。不必要捕获解释器锁,但是当需要同步调用这个函数进行序列化的时候可能需要锁定。

16. void PyInterpreterState_Clear( PyInterpreterState *interp)Reset all information in an interpreter state object. The interpreter lock must be held.
重置解释器状态对象中的所有信息。解释器锁必须被获取。

17. void PyInterpreterState_Delete( PyInterpreterState *interp)Destroy
an interpreter state object. The interpreter lock need not be held. The
interpreter state must have been reset with a previous call to
PyInterpreterState
_Clear().
析构一个解释器状态对象。解释器锁需要获取。解释器对象必须预先用
PyInterpreterState
_Clear() 重置。

18. PyThreadState* PyThreadState_New( PyInterpreterState *interp)Create
a new thread state object belonging to the given interpreter object.
The interpreter lock need not be held, but may be held if it is
necessary to serialize calls to this function.
创建一个从属于给定解释器的新线程状态对象。解释器锁不需要捕获,但是需要同步调用该函数时可能需要捕获。

19. void PyThreadState_Clear( PyThreadState *tstate)Reset all information in a thread state object. The interpreter lock must be held.
重置指定线程状态对象的所有信息。解释器锁必须捕获。

20. void PyThreadState_Delete( PyThreadState *tstate)Destroy
a thread state object. The interpreter lock need not be held. The
thread state must have been reset with a previous call to
PyThreadState
_Clear().
销毁一个线程状态对象。不需要捕获解释器锁。线程状态必须提前调用
PyThreadState
_Clear() 进行清除。

21. PyThreadState* PyThreadState_Get( )Return
the current thread state. The interpreter lock must be held. When the
current thread state is NULL, this issues a fatal error (so that the
caller needn't check for NULL).
返回当前解释器状态。必须捕获解释器锁。当前线程状态如果为 NULL, 发生一个致命错误(因此调用者不需要校验NULL)。

22. PyThreadState* PyThreadState_Swap( PyThreadState *tstate)Swap
the current thread state with the thread state given by the argument
tstate, which may be NULL. The interpreter lock must be held.
将当前线程状态与给定的参数 tstate 交换,tstate可能为 NULL。解释器锁必须被捕获。

23. PyObject* PyThreadState_GetDict( )Return value: Borrowed reference.
返回值:托管引用。
Return
a dictionary in which extensions can store thread-specific state
information. Each extension should use a unique key to use to store
state in the dictionary. It is okay to call this function when no
current thread state is available. If this function returns NULL, no
exception has been raised and the caller should assume no current
thread state is available. Changed in version 2.3: Previously this
could only be called when a current thread is active, and NULL meant
that an exception was raised.

回可存储线程独立的状态信息的一个扩展字典。每个扩展需要一个唯一键用于在字典中保存状态。当前线程状态不可用的时候它也可以调用。如果这个函数返回
NULL,没有抛出异常,调用者会假定当前线程状态无效。自 2.3 版以后的修改:以前它只能在当前线程激活的情况下被调用,如果返回 NULL
就意味着发生了异常。

24. int PyThreadState_SetAsyncExc( long id, PyObject *exc)Asynchronously
raise an exception in a thread. The id argument is the thread id of the
target thread; exc is the exception object to be raised. This function
does not steal any references to exc. To prevent naive misuse, you must
write your own C extension to call this. Must be called with the GIL
held. Returns the number of thread states modified; this is normally
one, but will be zero if the thread id isn't found. If exc is NULL, the
pending exception (if any) for the thread is cleared. This raises no
exceptions. New in version 2.3.

线程中异步抛出一个异常。参数 id 是目标线程的线程 id ;exc 是要抛出的异常对象。这个函数不获取 exc
的任何引用。为了防止低级错误,你必须自己编写你的 C 扩展来调用它。调用必须捕获 GIL。返回线程状态修改数;通常为 1 ,但是如果线程 id
没有找到就会返回0。如果 exc 是 NULL, 所有异常(任何可能)都会从线程中清除。不抛出异常。2.3版新加入。

25. PyGILState_STATE PyGILState_Ensure( )Ensure
that the current thread is ready to call the Python C API regardless of
the current state of Python, or of its thread lock. This may be called
as many times as desired by a thread as long as each call is matched
with a call to PyGILState_Release(). In general, other thread-related
APIs may be used between PyGILState_Ensure() and PyGILState_Release()
calls as long as the thread state is restored to its previous state
before the Release(). For example, normal usage of the
Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros is acceptable.

保当前线程已经可以调用与当前 Python 状态无关的 Python C API,或者它的线程锁。当一个线程每次希望匹配到
PyGILState_Release() 调用时可能会反复调用这个函数。通常,在线程状态恢复为 Release()
之前的状态时,其它线程相关的 API 可能会在一对 PyGILState_Ensure() 和 PyGILState_Release()
之间调用。例如,通常可以用于 Py_BEGIN_ALLOW_THREADS 宏 Py_END_ALLOW_THREADS。
The
return value is an opaque "handle" to the thread state when
PyGILState_Acquire() was called, and must be passed to
PyGILState_Release() to ensure Python is left in the same state. Even
though recursive calls are allowed, these handles cannot be shared -
each unique call to PyGILState_Ensure must save the handle for its call
to PyGILState_Release.
PyGILState_Acquire
()被调用的时候,返回值是一个不透明的线程状态“句柄”,Python离开当前状态时一定会被被传递到 PyGILState_Release()
。甚至尽管允许递归调用,这些句柄也不能共享——每次调用 PyGILState_Ensure
都是唯一的,它们的句柄对应它们的PyGILState_Release。
When the function returns, the current thread will hold the GIL. Failure is a fatal error. New in version 2.3.
当函数返回,当前线程将会捕获 GIL ,失败会造成致命错误。2.3版新增。

26. void PyGILState_Release( PyGILState_STATE)Release
any resources previously acquired. After this call, Python's state will
be the same as it was prior to the corresponding PyGILState_Ensure call
(but generally this state will be unknown to the caller, hence the use
of the GILState API.)
释放所有之前获取的资源。这个调用之后,Python的状态会与之前 PyGILState_Ensure 调用一致(但是通常这个状态对调用者是未知的,因此使用 GILState API)。
Every call to PyGILState_Ensure() must be matched by a call to PyGILState_Release() on the same thread. New in version 2.3.
每次调用 PyGILState_Ensure() 都要在同一线程对应调用 PyGILState_Release() 。2.3版本新增。

Trackback: http://tb.blog.csdn.net/TrackBack.aspx?PostId=1347424