【问题标题】:Python Multiprocessing - Too SlowPython 多处理 - 太慢了
【发布时间】:2016-02-14 10:23:33
【问题描述】:

我为特定功能构建了一个多处理密码破解程序(使用单词表),与使用单个进程相比,它所需的时间减少了一半。

最初的问题是它会显示破解的密码并终止工作人员,但剩余的工作人员会继续进行,直到他们用完可以散列的单词!不理想。

我向前迈出的新一步是使用Manager.Event() 来终止剩余的工作人员,这正如我所希望的那样(经过一些试验和错误),但是现在应用程序需要更长的时间,而不是作为一个单一的进程,我'确定这一定是由于pwd_find() 中的if 函数,但我想我会寻求一些建议。

#!/usr/bin/env python

import hashlib, os, time, math
from hashlib import md5
from multiprocessing import Pool, cpu_count, Manager

def screen_clear(): # Small function for clearing the screen on Unix or Windows
    if os.name == 'nt':
        return os.system('cls')
    else:
        return os.system('clear')

cores = cpu_count() # Var containing number of cores (Threads)

screen_clear()

print ""
print "Welcome to the Technicolor md5 cracker"
print ""

user = raw_input("Username: ")
print ""
nonce = raw_input("Nonce: ")
print ""
hash = raw_input("Hash: ")
print ""
file = raw_input("Wordlist: ")
screen_clear()
print "Cracking the password for \"" + user + "\" using " 
time1 = time.time() # Begins the 'Clock' for timing

realm = "Technicolor Gateway" # These 3 variables dont appear to change
qop = "auth"
uri = "/login.lp"

HA2 = md5("GET" + ":" + uri).hexdigest() # This hash doesn't contain any changing variables so doesn't need to be recalculated

file = open(file, 'r') # Opens the wordlist file
wordlist = file.readlines() # This enables us to use len()
length = len(wordlist)

screen_clear()
print "Cracking the password for \"" + user + "\" using " + str(length) + " words"

break_points = []  # List that will have start and stopping points
for i in range(cores):  # Creates start and stopping points based on length of word list
    break_points.append({"start":int(math.ceil((length+0.0)/cores * i)), "stop":int(math.ceil((length+0.0)/cores * (i + 1)))})

def pwd_find(start, stop, event):
    for number in range(start, stop):
        if not event.is_set():
            word = (wordlist[number])
            pwd = word.replace("\n","") # Removes newline character
            HA1 = md5(user + ":" + realm + ":" + pwd).hexdigest()
            hidepw = md5(HA1 + ":" + nonce +":" + "00000001" + ":" + "xyz" + ":" + qop + ":" + HA2).hexdigest()
            if hidepw == hash:
                screen_clear()
                time2 = time.time() # stops the 'Clock'
                timetotal = math.ceil(time2 - time1) # Calculates the time taken
                print "\"" + pwd + "\"" + " = " + hidepw + " (in " + str(timetotal) + " seconds)"
                print ""
                event.set()
                p.terminate
                p.join
        else:
            p.terminate
            p.join

if __name__ == '__main__':  # Added this because the multiprocessor module sometimes acts funny without it.

    p = Pool(cores)  # Number of processes to create.
    m = Manager()
    event = m.Event()
    for i in break_points:  # Cycles though the breakpoints list created above.
        i['event'] = event
        a = p.apply_async(pwd_find, kwds=i, args=tuple())  # This will start the separate processes.
    p.close() # Prevents any more processes being started
    p.join() # Waits for worker process to end

if event.is_set():
    end = raw_input("hit enter to exit")
    file.close() # Closes the wordlist file
    screen_clear()
    exit()
else:
    screen_clear()
    time2 = time.time() # Stops the 'Clock'
    totaltime = math.ceil(time2 - time1) # Calculates the time taken
    print "Sorry your password was not found (in " + str(totaltime) + " seconds) out of " + str(length) + " words"
    print ""
    end = raw_input("hit enter to exit")
    file.close() # Closes the wordlist file
    screen_clear()
    exit()

编辑(针对@noxdafox):

def finisher(answer):
    if answer:
        p.terminate()
        p.join()
        end = raw_input("hit enter to exit")
        file.close() # Closes the wordlist file
        screen_clear()
        exit()

def pwd_find(start, stop):
    for number in range(start, stop):
        word = (wordlist[number])
        pwd = word.replace("\n","") # Removes newline character
        HA1 = md5(user + ":" + realm + ":" + pwd).hexdigest()
        hidepw = md5(HA1 + ":" + nonce +":" + "00000001" + ":" + "xyz" + ":" + qop + ":" + HA2).hexdigest()
        if hidepw == hash:
            screen_clear()
            time2 = time.time() # stops the 'Clock'
            timetotal = math.ceil(time2 - time1) # Calculates the time taken
            print "\"" + pwd + "\"" + " = " + hidepw + " (in " + str(timetotal) + " seconds)"
            print ""
            return True
        elif hidepw != hash:
            return False

if __name__ == '__main__':  # Added this because the multiprocessor module sometimes acts funny without it.

    p = Pool(cores)  # Number of processes to create.
    for i in break_points:  # Cycles though the breakpoints list created above.
        a = p.apply_async(pwd_find, kwds=i, args=tuple(), callback=finisher)  # This will start the separate processes.
    p.close() # Prevents any more processes being started
    p.join() # Waits for worker process to end

【问题讨论】:

  • This 可能会有所帮助

标签: python multithreading python-2.7 passwords multiprocessing


【解决方案1】:

您可以使用 Pool 原语来解决您的问题。您无需共享访问同步且缓慢的 Event 对象。

Here 我举了一个例子,说明如何根据工作人员的期望结果终止池。

您可以通过返回特定值来简单地向池发出信号,并在回调中终止池。

【讨论】:

  • 谢谢@noxdafox ,我之前尝试过这种方法但无法让它发挥作用,我想我会再以你的帖子为例,我又遇到了同样的问题- 使用pdb 我可以看到它通过回调函数运行一次或两次然后终止,即使answer 从未被称为True。他是通过回调调用的函数:def finisher(answer): if answer: p.terminate p.join end = raw_input("hit enter to exit") file.close() screen_clear() exit()
  • 我很抱歉我的评论看起来很混乱,我不知道发布更新代码的礼仪!
  • 不确定是否可以在 cmets 中发布代码,只需将其添加到原始帖子中即可。请发布您尝试过的代码。
  • 终止和加入是函数。不是属性。 p.terminate()p.join()。也不要在回调中使用raw_input,因为它是异步执行的。只需将它放在主循环中(在closejoin 之后)。
  • 查看您的工作循环,它会立即返回。将 if: ... False 从 for 循环中移出。
【解决方案2】:

我认为你的预感是正确的。您正在检查快速循环内的同步原语。我可能只会检查是否经常设置事件。您可以尝试找到最佳检查点,在该点检查它足以不做太多工作,但不要太频繁以至于减慢程序速度。

【讨论】:

  • 如果我将if not event.is_set(): 移到for number in range(start, stop): 之外并增加cores 变量(增加线程数),所有线程会尝试同时启动还是会他们等到空间空闲(我不完全理解这种多处理是如何工作的),如果他们等待,那么一旦找到答案就会阻止他们开始 - 我正确吗?我一到家就去试试。
  • 所有工作线程将同时启动,但每个工作线程将在一个函数上工作,并且在完成或函数释放 GIL 之前不会选择新的。鉴于您的代码只为每个工作人员安排一项任务,这一点没有实际意义。如果你过度安排你的机器,以至于你制造的工人数量超出了合理的水平,他们将不得不分时分享他们的 CPU 分配,最终可能会变慢。但是工作线程的数量是你可以玩的,看看如何让它尽可能快地运行。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2018-11-19
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多