Python高级编程和异步IO并发编程（二）

一、多线程、多进程和线程池编程

1、GIL

　gil global interpreter lock （全局解释器锁）

　python中一个线程对应于c语言中的一个线程（cpython）

　gil使得同一个时刻只有一个线程在一个cpu上执行字节码, 无法将多个线程映射到多个cpu上执行

GIL主动释放的情况：

　gil会根据执行的字节码行数以及时间片释放gil
　gil在遇到io的操作时候会主动释放

2、多线程编程 --threading

　操作系统能调度的最小单元是线程

　对于io操作来说，多线程和多进程性能差别不大

setdaemon 方法：守护线程，主线程运行完毕，立刻结束所有线程
join 方法：线程阻塞，无论主线程是否运行完毕，都需等待子线程执行完成才能结束

创建多线程两种方式

1）直接使用：

import time
import threading

def get_detail_html(url):
    print("get detail html started")
    time.sleep(2)
    print("get detail html end")

def get_detail_url(url):
    print("get detail url started")
    time.sleep(4)
    print("get detail url end")

if  __name__ == "__main__":
    thread1 = threading.Thread（target=get_detail_html）
    thread1 = threading.Thread（target=get_detail_html）
    #thread1.setDaemon(True)
    #thread2.setDaemon(True) # 守护线程

    start_time = time.time()
    thread1.start()
    thread2.start()

    thread1.join()
    thread2.join()   # 线程阻塞

    print ("last time: {}".format(time.time()-start_time))

2）使用thread继承方式：

　推荐这种方式，我们可以在类中做更多需要的处理。

class GetDetailHtml(threading.Thread):
    def __init__(self, name):
        super().__init__(name=name)

    def run(self):   # 重载 threading.Thread 中的方法
        print("get detail html started")
        time.sleep(2)
        print("get detail html end")

class GetDetailUrl(threading.Thread):
    def __init__(self, name):
        super().__init__(name=name)

    def run(self): 
        print("get detail url started")
        time.sleep(4)
        print("get detail url end")

if  __name__ == "__main__":
    thread1 = GetDetailHtml("get_detail_html")
    thread2 = GetDetailUrl("get_detail_url")
    start_time = time.time()
    thread1.start()
    thread2.start()

    thread1.join()
    thread2.join()

    print ("last time: {}".format(time.time()-start_time))

3、线程间通信：共享变量和 Queue

1）共享变量的方式，即全局定义一个变量，给多个子线程中调用。安全性不高，不建议使用

2）Queue 队列的方式，比较安全，推荐使用.Queue常用方法可查看源码

#通过queue的方式进行线程间同步，更安全
from queue import Queue
import time
import threading

def get_detail_html(queue):
    # 爬取文章详情页
    while True:
        url = queue.get()  # queue get方法 阻塞
        print("get detail html started")
        time.sleep(2)
        print("get detail html end")

def get_detail_url(queue):
    # 爬取文章列表页
    while True:
        print("get detail url started")
        time.sleep(4)
        for i in range(20):
            queue.put("http://projectsedu.com/{id}".format(id=i))  # queue put 方法
        print("get detail url end")

if  __name__ == "__main__":
    detail_url_queue = Queue(maxsize=1000)

    thread_detail_url = threading.Thread(target=get_detail_url, args=(detail_url_queue,))
    for i in range(10):
        html_thread = threading.Thread(target=get_detail_html, args=(detail_url_queue,))
        html_thread.start()
    start_time = time.time()

    detail_url_queue.task_done()  结束队列阻塞
    detail_url_queue.join()  # 队列阻塞

    print ("last time: {}".format(time.time()-start_time))

4. 线程同步：Lock、RLock

1）Lock

from threading import Lock

lock = Lock()

    lock.acquire()   # 获取锁
    lock.acquire()   # 上面已获取锁，再acquire则会阻塞住，即在锁未释放前，两次acquire就会造成死锁
    total += 1
    lock.release()   # 释放锁，在释放锁之前其他线程都会被hold住，等待锁释放
    lock.release()

注意：1、用锁会影响性能

　　　 2、用锁注意造成死锁问题：

　　　　1）未释放锁前，连续两次acquire 会造成死锁

　　　　2）资源竞争：即线程一要的资源在线程二中，线程二要的资源在线程一中，线程一获取锁需要用到线程二的资源，然而锁在线程一中，线程二锁住（阻塞住）无法将资源给到线程一，因此造成死锁

2）RLock

可重入的锁，在同一个线程里面，可以连续调用多次acquire，一定要注意acquire的次数要和release的次数相等

from threading import Lock, RLock, Condition

#RLock可重入的锁，在同一个线程里面，可以连续调用多次acquire， 一定要注意acquire的次数要和release的次数相等

total = 0
lock = RLock()
def add():
    global lock
    global total
    for i in range(1000000):
        lock.acquire()   # 获取锁
        lock.acquire()   # RLock ，不会造成死锁
        total += 1
        lock.release()   # 释放锁
        lock.release()  # release次数需与acquire次数一致

RLock的可用性要远远高于Lock，如果要使用锁，建议使用RLock

from threading import Lock, RLock, Condition


total = 0
lock = RLock()
def add():
    global lock
    global total
    for i in range(1000000):
        lock.acquire()   # 获取锁
        lock.acquire()   
        total += 1
        lock.release()   # 释放锁
        lock.release()

def desc():
    global total
    global lock
    for i in range(1000000):
        lock.acquire()
        total -= 1
        lock.release()

import threading
thread1 = threading.Thread(target=add)
thread2 = threading.Thread(target=desc)
thread1.start()
thread2.start()

thread1.join()
thread2.join()
print(total)

demo