多尺度模板匹配与文本检测答案

【问题标题】：Multi-scale template match vs. Text Detection多尺度模板匹配与文本检测
【发布时间】：2021-10-06 06:57:32
【问题描述】：

我正在尝试使用 PyAutoGUI 自动导航网站以获取数据并下载文件以检测图像和按钮，但在其他人的计算机上使用它时遇到问题。在我看来，匹配文本图像是这里最大的障碍。

我怀疑问题出在缩放和分辨率上，所以我尝试使用多比例模板匹配，但我发现使用我放大的模板根本不会创建匹配。使用我缩小的模板也没有帮助，因为它要么找不到任何匹配项，要么即使在 0.8-0.9 的小范围置信度下也找不到错误的匹配项。

这是 74x17 的原始图像。

这是 348x80 的放大图像（Windows Photo 不允许我将它放大到更小，因为某些原因）。

这是缩小40x8 的图片。

目前，对于缩小图像，PyAutoGUI 将上述图像与此图像混淆：

这是我写的代码（有些是从别人那里借来的。

我借用的多缩放代码：

# Functions to search for resized versions of images
def template_match_with_scaling(image,gs=True,confidence=0.8):

# Locate an image and return a pyscreeze box surrounding it. 
# Template matching is done by default in grayscale (gs=True)
# Detect image if normalized correlation coefficient is > confidence (0.8 is default)

    templateim = pyscreeze._load_cv2(image,grayscale=gs)        # loads the image
    (tH, tW)   = templateim.shape[:2]       # changes the orientation
    screenim_color = pyautogui.screenshot()     # screenshot of image
    screenim_color = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_RGB2BGR)

    # Checking if the locateOnScreen() is utilized with grayscale=True or not
    if gs is True:
       screenim = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_BGR2GRAY)
    else:
       screenim = screenim_color

    #try different scaling parameters and see which one matches best
    found = None #bookeeping variable for the maximum correlation coefficient, position and scale
    scalingrange = np.linspace(0.25,5,num=150)

    for scale in scalingrange:
        print("Trying another scale")
        resizedtemplate = imutils.resize(templateim,  width = int(templateim.shape[1]*scale) ) # resizing with  imutils maintains the aspect ratio
        r = float(resizedtemplate.shape[1])/templateim.shape[1] # recompute scaling factor
        result = cv2.matchTemplate(screenim, resizedtemplate, cv2.TM_CCOEFF_NORMED) # template matching using the correlation coefficient
        (_, maxVal, _, maxLoc) = cv2.minMaxLoc(result) #returns a 4-tuple which includes the minimum correlation value, the maximum correlation value, the (x, y)-coordinate of the minimum value, and the (x, y)-coordinate of the maximum value
        if found is None or maxVal > found[0]:
           found = (maxVal, maxLoc, r)
           
    (maxVal, maxLoc, r) = found
    if maxVal > confidence:
       box = pyscreeze.Box(int(maxLoc[0]), int(maxLoc[1]), int(tW*r), int(tH*r) )
       return box
    else:
       return None

def locate_center_with_scaling(image,gs=True):
    loc = template_match_with_scaling(image,gs=gs) 
    if loc:
       return pyautogui.center(loc)
    else:
       raise Exception("Image not found")

我要匹配的代码并单击其标识符旁边的文本框：

while SKUnoCounter <= len(listOfSKUs):

    while pyautogui.locateOnScreen('DescriptionBox-RESIZEDsmall.png', grayscale=True, confidence=0.8 ) is None:
        print("Looking for Description Box.")

        if locate_center_with_scaling('DescriptionBox-RESIZEDsmall.png') is not None:
            print("Found a resized version of Description Box. ")

            #Calling to function
            DB_x, DB_y = locate_center_with_scaling('DescriptionBox-RESIZEDsmall.png')
            
            #Clicking on Description text box
            pyautogui.click( DB_x + 417,  DB_y +12,  button='left')
            
            break
        time.sleep(0.5)

如果我的目标是在各种计算机上使用它，是否值得尝试提高多尺度模板匹配的准确性？尝试使用 OCR 检测文本而不是图像会更好吗？我在这里的另一个想法是使用 PyTesseract 来定位我正在搜索的文本，然后使用这些坐标单击内容。 Selenium 在这里不起作用，因为我需要在现有的 IE 浏览器上工作。

非常感谢这里的任何输入！

【问题讨论】：

您的代码在DescriptionBox-RESIZEDlarge.png 下运行良好。它无法检测到小的描述框，因为它与原始图像的纵横比不同。您必须修改 template_match_with_scaling 以独立缩放宽度和高度

标签： python automation tesseract cv2 pyautogui

【解决方案1】：

根据我上面的评论，这就是修改后的函数的样子

# Functions to search for resized versions of images
def template_match_with_scaling(image,gs=True,confidence=0.8, scalingrange=None):

# Locate an image and return a pyscreeze box surrounding it. 
# Template matching is done by default in grayscale (gs=True)
# Detect image if normalized correlation coefficient is > confidence (0.8 is default)
    templateim = pyscreeze._load_cv2(image,grayscale=gs)        # loads the image
    (tH, tW)   = templateim.shape[:2]       # changes the orientation
    screenim_color = pyautogui.screenshot()     # screenshot of image
    screenim_color = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_RGB2BGR)

    # Checking if the locateOnScreen() is utilized with grayscale=True or not
    if gs is True:
       screenim = cv2.cvtColor(np.array(screenim_color),cv2.COLOR_BGR2GRAY)
    else:
       screenim = screenim_color

    #try different scaling parameters and see which one matches best
    found = None #bookeeping variable for the maximum correlation coefficient, position and scale
    
    for scalex in scalingrange:
      width = int(templateim.shape[1] * scalex) 
      for scaley in scalingrange:
        #print("Trying another scale")
        #print(scalex,scaley)
        height = int(templateim.shape[0] * scaley)
        scaledsize = (width, height)
 
        # resize image
        resizedtemplate = cv2.resize(templateim, scaledsize)
        #resizedtemplate = imutils.resize(templateim,  width = int(templateim.shape[1]*scale) ) # resizing with  imutils maintains the aspect ratio
        ry = float(resizedtemplate.shape[1])/templateim.shape[1] # recompute scaling factor
        rx = float(resizedtemplate.shape[0])/templateim.shape[0] # recompute scaling factor
        result = cv2.matchTemplate(screenim, resizedtemplate, cv2.TM_CCOEFF_NORMED) # template matching using the correlation coefficient
        (_, maxVal, _, maxLoc) = cv2.minMaxLoc(result) #returns a 4-tuple which includes the minimum correlation value, the maximum correlation value, the (x, y)-coordinate of the minimum value, and the (x, y)-coordinate of the maximum value
        if found is None or maxVal > found[0]:
           found = (maxVal, maxLoc, rx, ry)
           
    (maxVal, maxLoc, rx, ry) = found
    print('maxVal= ', maxVal)
    if maxVal > confidence:
       box = pyscreeze.Box(int(maxLoc[0]), int(maxLoc[1]), int(tW*rx), int(tH*ry) )
       return box
    else:
       return None

def locate_center_with_scaling(image,gs=True,**kwargs):
    loc = template_match_with_scaling(image,gs=gs,**kwargs) 
    if loc:
       return pyautogui.center(loc)
    else:
       raise Exception("Image not found")

im =  'DescriptionBox.png' # we will try to detect the small description box, whose width and height are scaled down by 0.54 and 0.47              
unscaledLocation = pyautogui.locateOnScreen(im, grayscale=True, confidence=0.8 )
srange = np.linspace(0.4,0.6,num=20) #scale width and height in this range
if unscaledLocation is None:
   print("Looking for Description Box.")
   scaledLocation = locate_center_with_scaling(im, scalingrange= srange)   
   if scaledLocation is not None:
      print(f'Found a resized version of Description Box at ({scaledLocation[0]},{scaledLocation[1]})')
      pyautogui.moveTo(scaledLocation[0], scaledLocation[1])

我们需要注意两点：

template_match_with_scaling 现在正在执行一个双循环，每个维度一个循环，因此检测模板图像需要一些时间。为了分摊检测时间，我们应该在第一次检测后保存宽度和高度的缩放参数，并根据这些参数缩放所有模板图像以供后续检测。
为了能够有效地检测模板，我们需要将template_match_with_scaling 的scalingrange 输入设置为适当的值范围。如果范围很小或没有足够的值，我们将无法检测到模板。如果太大，检测时间会很长。

【讨论】：

哇，这太棒了！我在发布这个问题后不久就开始研究 PyTesseract，它有点工作，所以在我走这条路之前，我将进一步探索这条途径，因为我觉得文本检测在这里可能更稳定一些。感谢您的帮助！