为什么这个守护进程线程阻塞？答案

【问题标题】：Why is this Daemon thread blocking?为什么这个守护进程线程阻塞？
【发布时间】：2016-03-11 08:47:05
【问题描述】：

为什么下面的代码会阻塞在 cc.start() 上？ crawler.py 包含类似于http://doc.scrapy.org/en/latest/topics/practices.html#run-from-script的代码

import scrapy
import threading
from subprocess import Popen, PIPE

def worker():
    crawler = Popen('python crawler.py', stdout=PIPE, stderr=PIPE, shell=True)
    while True:
        line = crawler.stderr.readline()
        print(line.strip())

cc = threading.Thread(target=worker())
cc.setDaemon(True)
cc.start()
print "Here" # This is not printed
# Do more stuff

crawler.py 包含以下代码：

from scrapy.crawler import CrawlerProcess
import scrapy

class MySpider(scrapy.Spider):
    name = 'stackoverflow'
    start_urls = ['http://stackoverflow.com/questions?sort=votes']

def parse(self, response):
    for href in response.css('.question-summary h3 a::attr(href)'):
        full_url = response.urljoin(href.extract())
        yield scrapy.Request(full_url, callback=self.parse_question)

    process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
    })

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished

【问题讨论】：

代码本身会中断，因为你没有使用subprocess.Popen。
对不起，我后来添加了导入语句，我修复了它。这不是问题。
那么我建议你相应地更新你的代码以排除任何可能的原因。
另外，您可能想通过Popen(...) 解释为什么要运行python 脚本，因为这根本没有任何意义。您可以轻松地将其包装在 import crawler 中。
我添加了被调用脚本的内容。我使用 Popen 是因为我想将爬虫作为项目的一部分运行，然后在此之后做更多的事情（为一件事捕获数据包）

标签： python-2.7 subprocess python-multithreading

【解决方案1】：

threading.Thread 将可调用对象作为参数（例如函数名），实际上是在创建线程实例时调用该函数

cc = threading.Thread(target=worker())

您需要做的只是传递要使用线程调用的函数

cc = threading.Thread(target=worker)

【讨论】：