【问题标题】:Multi-scale template match vs. Text Detection多尺度模板匹配与文本检测
【发布时间】:2021-10-06 06:57:32
【问题描述】:

我正在尝试使用 PyAutoGUI 自动导航网站以获取数据并下载文件以检测图像和按钮,但在其他人的计算机上使用它时遇到问题。在我看来,匹配文本图像是这里最大的障碍。

我怀疑问题出在缩放和分辨率上,所以我尝试使用多比例模板匹配,但我发现使用我放大的模板根本不会创建匹配。使用我缩小的模板也没有帮助,因为它要么找不到任何匹配项,要么即使在 0.8-0.9 的小范围置信度下也找不到错误的匹配项。

这是 74x17 的原始图像。

这是 348x80 的放大图像(Windows Photo 不允许我将它放大到更小,因为某些原因)。

这是 缩小40x8 的图片。

目前,对于缩小图像,PyAutoGUI 将上述图像与此图像混淆:

这是我写的代码(有些是从别人那里借来的。

我借用的多缩放代码:

# Functions to search for resized versions of images
def template_match_with_scaling(image,gs=True,confidence=0.8):

# Locate an image and return a pyscreeze box surrounding it. 
# Template matching is done by default in grayscale (gs=True)
# Detect image if normalized correlation coefficient is > confidence (0.8 is default)

    templateim = pyscreeze._load_cv2(image,grayscale=gs)        # loads the image
    (tH, tW)   = templateim.shape[:2]       # changes the orientation
    screenim_color = pyautogui.screenshot()     # screenshot of image
    screenim_color = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_RGB2BGR)

    # Checking if the locateOnScreen() is utilized with grayscale=True or not
    if gs is True:
       screenim = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_BGR2GRAY)
    else:
       screenim = screenim_color

    #try different scaling parameters and see which one matches best
    found = None #bookeeping variable for the maximum correlation coefficient, position and scale
    scalingrange = np.linspace(0.25,5,num=150)

    for scale in scalingrange:
        print("Trying another scale")
        resizedtemplate = imutils.resize(templateim,  width = int(templateim.shape[1]*scale) ) # resizing with  imutils maintains the aspect ratio
        r = float(resizedtemplate.shape[1])/templateim.shape[1] # recompute scaling factor
        result = cv2.matchTemplate(screenim, resizedtemplate, cv2.TM_CCOEFF_NORMED) # template matching using the correlation coefficient
        (_, maxVal, _, maxLoc) = cv2.minMaxLoc(result) #returns a 4-tuple which includes the minimum correlation value, the maximum correlation value, the (x, y)-coordinate of the minimum value, and the (x, y)-coordinate of the maximum value
        if found is None or maxVal > found[0]:
           found = (maxVal, maxLoc, r)
           
    (maxVal, maxLoc, r) = found
    if maxVal > confidence:
       box = pyscreeze.Box(int(maxLoc[0]), int(maxLoc[1]), int(tW*r), int(tH*r) )
       return box
    else:
       return None

def locate_center_with_scaling(image,gs=True):
    loc = template_match_with_scaling(image,gs=gs) 
    if loc:
       return pyautogui.center(loc)
    else:
       raise Exception("Image not found")

我要匹配的代码并单击其标识符旁边的文本框:

while SKUnoCounter <= len(listOfSKUs):

    while pyautogui.locateOnScreen('DescriptionBox-RESIZEDsmall.png', grayscale=True, confidence=0.8 ) is None:
        print("Looking for Description Box.")

        if locate_center_with_scaling('DescriptionBox-RESIZEDsmall.png') is not None:
            print("Found a resized version of Description Box. ")

            #Calling to function
            DB_x, DB_y = locate_center_with_scaling('DescriptionBox-RESIZEDsmall.png')
            
            #Clicking on Description text box
            pyautogui.click( DB_x + 417,  DB_y +12,  button='left')
            
            break
        time.sleep(0.5) 

如果我的目标是在各种计算机上使用它,是否值得尝试提高多尺度模板匹配的准确性?尝试使用 OCR 检测文本而不是图像会更好吗?我在这里的另一个想法是使用 PyTesseract 来定位我正在搜索的文本,然后使用这些坐标单击内容。 Selenium 在这里不起作用,因为我需要在现有的 IE 浏览器上工作。

非常感谢这里的任何输入!

【问题讨论】:

  • 您的代码在DescriptionBox-RESIZEDlarge.png 下运行良好。它无法检测到小的描述框,因为它与原始图像的纵横比不同。您必须修改 template_match_with_scaling 以独立缩放宽度和高度

标签: python automation tesseract cv2 pyautogui


【解决方案1】:

根据我上面的评论,这就是修改后的函数的样子

# Functions to search for resized versions of images
def template_match_with_scaling(image,gs=True,confidence=0.8, scalingrange=None):

# Locate an image and return a pyscreeze box surrounding it. 
# Template matching is done by default in grayscale (gs=True)
# Detect image if normalized correlation coefficient is > confidence (0.8 is default)
    templateim = pyscreeze._load_cv2(image,grayscale=gs)        # loads the image
    (tH, tW)   = templateim.shape[:2]       # changes the orientation
    screenim_color = pyautogui.screenshot()     # screenshot of image
    screenim_color = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_RGB2BGR)

    # Checking if the locateOnScreen() is utilized with grayscale=True or not
    if gs is True:
       screenim = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_BGR2GRAY)
    else:
       screenim = screenim_color

    #try different scaling parameters and see which one matches best
    found = None #bookeeping variable for the maximum correlation coefficient, position and scale
    
    for scalex in scalingrange:
      width = int(templateim.shape[1] * scalex) 
      for scaley in scalingrange:
        #print("Trying another scale")
        #print(scalex,scaley)
        height = int(templateim.shape[0] * scaley)
        scaledsize = (width, height)
 
        # resize image
        resizedtemplate = cv2.resize(templateim, scaledsize)
        #resizedtemplate = imutils.resize(templateim,  width = int(templateim.shape[1]*scale) ) # resizing with  imutils maintains the aspect ratio
        ry = float(resizedtemplate.shape[1])/templateim.shape[1] # recompute scaling factor
        rx = float(resizedtemplate.shape[0])/templateim.shape[0] # recompute scaling factor
        result = cv2.matchTemplate(screenim, resizedtemplate, cv2.TM_CCOEFF_NORMED) # template matching using the correlation coefficient
        (_, maxVal, _, maxLoc) = cv2.minMaxLoc(result) #returns a 4-tuple which includes the minimum correlation value, the maximum correlation value, the (x, y)-coordinate of the minimum value, and the (x, y)-coordinate of the maximum value
        if found is None or maxVal > found[0]:
           found = (maxVal, maxLoc, rx, ry)
           
    (maxVal, maxLoc, rx, ry) = found
    print('maxVal= ', maxVal)
    if maxVal > confidence:
       box = pyscreeze.Box(int(maxLoc[0]), int(maxLoc[1]), int(tW*rx), int(tH*ry) )
       return box
    else:
       return None

def locate_center_with_scaling(image,gs=True,**kwargs):
    loc = template_match_with_scaling(image,gs=gs,**kwargs) 
    if loc:
       return pyautogui.center(loc)
    else:
       raise Exception("Image not found")

im =  'DescriptionBox.png' # we will try to detect the small description box, whose width and height are scaled down by 0.54 and 0.47              
unscaledLocation = pyautogui.locateOnScreen(im, grayscale=True, confidence=0.8 )
srange = np.linspace(0.4,0.6,num=20) #scale width and height in this range
if unscaledLocation is None:
   print("Looking for Description Box.")
   scaledLocation = locate_center_with_scaling(im, scalingrange= srange)   
   if scaledLocation is not None:
      print(f'Found a resized version of Description Box at ({scaledLocation[0]},{scaledLocation[1]})')
      pyautogui.moveTo(scaledLocation[0], scaledLocation[1])       

我们需要注意两点:

  • template_match_with_scaling 现在正在执行一个双循环,每个维度一个循环,因此检测模板图像需要一些时间。为了分摊检测时间,我们应该在第一次检测后保存宽度和高度的缩放参数,并根据这些参数缩放所有模板图像以供后续检测。
  • 为了能够有效地检测模板,我们需要将template_match_with_scalingscalingrange 输入设置为适当的值范围。如果范围很小或没有足够的值,我们将无法检测到模板。如果太大,检测时间会很长。

【讨论】:

  • 哇,这太棒了!我在发布这个问题后不久就开始研究 PyTesseract,它有点工作,所以在我走这条路之前,我将进一步探索这条途径,因为我觉得文本检测在这里可能更稳定一些。感谢您的帮助!
猜你喜欢
  • 2019-09-16
  • 1970-01-01
  • 1970-01-01
  • 2018-07-15
  • 2011-09-22
  • 2019-04-07
  • 1970-01-01
  • 2014-07-05
  • 1970-01-01
相关资源
最近更新 更多