【问题标题】:Multithreading: How to stop/edit individual threads?多线程:如何停止/编辑单个线程?
【发布时间】:2020-10-26 17:16:28
【问题描述】:

我正在尝试创建一个方法来创建线程并将它们发送到线程池中。创建单个线程后如何停止它们? 编辑:这是用于网络抓取,需要在后台运行数天,它将是动态数量的进程和许多其他任务(我只添加了 1 以供参考。我也不希望进程完成后结束(将循环任务)仅在用户请求时结束

def Target(web,delay):
    log = ("starting")
    # gives headless option to chromedriver
    op = webdriver.ChromeOptions()
    op.add_argument('headless')
    driver = webdriver.Chrome(options=op)
    # launches driver with desired webpage
    driver.get(web)
    log = ("getting webpage")
    while [False != True]:
       try:
        #test to check if on correct page
            #looking for matching key
           log = ("checking stock")
           elem = driver.find_element_by_xpath('//*[@id="viewport"]/div[5]/div/div[2]/div[3]/div[1]/div/div[3]/div[1]/div[2]/button')
           if elem.is_displayed():
               log = ("instock")
               title= driver.title
               url= driver.current_url
               return (title, url)
       except NoSuchElementException:
           print("product is not in stock.... trying again")
           #retry delay
           time.sleep(float(delay))
           driver.get(web)

def multimethodv2(MethodToRun, url, delay,id):
    if __name__ == "__main__":
        pool = ThreadPoolExecutor()
        pool.submit(Target,url,delay)

【问题讨论】:

  • if __name__ == "__main__" 在函数内部不是很地道
  • 如果你只运行一个任务,那么在这里使用线程池是没有意义的。如果您正在运行多个任务,请参阅Python selenium multiprocessing 了解有关如何初始化线程的想法,这样您就不会一遍又一遍地重新创建驱动程序。特别是,请参阅my refinement to this,它会修改接受的答案以确保在您完成后终止驱动程序进程。
  • 现在,你根本没有给driver.quit()打电话,这不是一件好事。
  • 这是用于网页抓取,需要在后台运行几天,它将是动态数量的进程和许多其他任务(我只添加了1个供参考)跨度>

标签: python multithreading


【解决方案1】:

我总是添加一个名为running的变量,并按如下方式实现:

from threading import Thread

def do_stuff():
    while running:
        ...
        if running:
            ...
        else:
            break

        # let's say one thread found a solution:
        running = False

threads = 4
running = True

for i in range(threads):
    t = Thread(target=do_stuff, daemon=True)
    t.start()

所有线程将检查running 变量,如果一个线程找到解决方案或程序应停止,则将其设置为False 并且线程退出它们的循环。在大多数情况下效果很好。

如果这不适合您,请查看threading events

查看stackoverflow问题的解释:Python Threading with Event object

【讨论】:

  • 问题是我需要动态创建这些任务,这就是我倾向于使用线程池的原因。
【解决方案2】:

您不能也不想停止线程池中的单个线程。当池的shutdown 被执行时,它们都会以两种方式之一终止:

使用 ThreadPoolExecutor 时,您调用 shutdown 方法您使用上下文管理器:with ThreadPoolExecutor() as pool:

def multimethodv2(MethodToRun, url, delay,id):
    with ThreadPoolExecutor(max_workers=1) as pool:
        future = pool.submit(Target,url,delay)
        result = future.result() # wait for submitted task to end first before terminating block

或者:

pool = ThreadPoolExecutor(max_workers=1)
future = pool.submit(Target,url,delay)
result = future.result() # wait for submitted task to end first before terminating pool
pool.shutdown()

或者:

pool = ThreadPoolExecutor(max_workers=1)
future = pool.submit(Target,url,delay)
pool.shutdown(wait=False) # return immediately
result = future.result() # if this is last completed future, the pool will now shutdown

但是你还是应该调用arrange来调用driver.quit(),否则会留下一个驱动进程:

import threading
import gc
from concurrent.futures import ThreadPoolExecutor
from selenium import webdriver
# etc.


threadLocal = threading.local()


class Driver:
    def __init__(self):
        log = ("starting")
        # gives headless option to chromedriver
        op = webdriver.ChromeOptions()
        op.add_argument('headless')
        self.driver = webdriver.Chrome(options=op)

    def __del__(self):
        self.driver.quit() # clean up driver when we are cleaned up
        #print('The driver has been "quitted".')

        
def create_driver():
    the_driver = getattr(threadLocal, 'the_driver', None)
    if the_driver is None:
        the_driver = Driver()
        setattr(threadLocal, 'the_driver', the_driver)
    return the_driver.driver



def Target(web,delay):
    driver = create_driver()
    # launches driver with desired webpage
    driver.get(web)
    log = ("getting webpage")
    while [False != True]:
       try:
        #test to check if on correct page
            #looking for matching key
           log = ("checking stock")
           elem = driver.find_element_by_xpath('//*[@id="viewport"]/div[5]/div/div[2]/div[3]/div[1]/div/div[3]/div[1]/div[2]/button')
           if elem.is_displayed():
               log = ("instock")
               title= driver.title
               url= driver.current_url
               return (title, url)
       except NoSuchElementException:
           print("product is not in stock.... trying again")
           #retry delay
           time.sleep(float(delay))
           driver.get(web)


# pool is now passed a an argument:
def multimethodv2(pool, MethodToRun, url, delay,id):
    future = pool.submit(Target,url,delay)
    return_value = future.result()

        
        
if __name__ == '__main__':
    N_THREADS = 1 # Put in a more realistic value when you have a more realistic example
    with ThreadPoolExecutor(max_workers=1) as pool:
        multimethodv2(pool, MethodToRun, url, delay, id)
    threadLocal = None # clean up drivers
    gc.collect()

【讨论】:

  • 感谢您的快速回复,主要问题是我正在动态创建这些踏板/期货,并希望用户能够根据请求终止线程。
  • @MichalPisarek 那么,真正的问题是什么?线程的存在是为了处理新的任务,因为它们是使用ThreadPoolExecutor 实例的submitmap 方法提交的。另请参阅我对您关于在这些线程中有效重用 Chrome 驱动程序的问题的评论。您可以避免为每个新 URL 创建和退出驱动程序,并为池中的每个工作线程执行一次。
  • 我已经用代码更新了答案,以展示您如何为每个工作线程一个 Chrome 驱动程序。
猜你喜欢
  • 1970-01-01
  • 2017-10-18
  • 1970-01-01
  • 2018-06-23
  • 1970-01-01
  • 2016-12-22
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多