获取哈希时忽略图像名称答案

【问题标题】：Ignore image name while getting hash获取哈希时忽略图像名称
【发布时间】：2022-01-20 20:23:54
【问题描述】：

我正在编写一个程序，该程序将图像作为输入，与数据库中的图像进行检查并输出具有相同哈希的图像

但是，当使用hash("imagepath") 2 相同的图像时，会给出不同的哈希值，即使唯一的区别是图像的名称，这让我相信名称是问题

有没有办法轻松忽略图像的名称？ (png)

【问题讨论】：

hash("imagepath") 仅对文件名进行哈希处理，而不对内容进行哈希处理。您需要阅读内容。
那我如何获取内容呢？
另外hash 不是加密哈希函数。根据您的需要，您可能需要选择不同的功能。
读取文件：docs.python.org/3/tutorial/…
其实大部分hash库都需要字节串，应该是hash(open("imagepath","rb").read())。您可能需要进行试验。

标签： python image hash

【解决方案1】：

我是如何解决的：我最终没有使用“散列”，而是通过将代码片段拼凑在一起来使用平均像素，然后找到具有相同平均像素的图像（平均像素在列表中，因此它获取然后用于查找名称的索引)

import requests

#Database of possible image average pixels
clone_imgs = [88.0465, 46.2568, 102.6426 ...]

image = <image url>
img_data = requests.get(image).content
with open('image.png', 'wb') as handler: #Download the image as "image.png" (Replace "image.png" with the path where you want to save it)
    handler.write(img_data)
img = Image.open(r"image.png") #Open the image for reading
img = img.resize((100, 100), Image.ANTIALIAS) #A series of compressions to the image
img = img.convert("L")
img_pixel_data = list(spawn.getdata())
img_avg_pixel = sum(spawn_pixel_data)/len(spawn_pixel_data) #Get the average pixel values

clone_img_index = clone_imgs.index(img_avg_pixel) #Find the same pixel value in the database

这对我有用，但有一些缺点：

图像的颜色必须 100% 相同（一个像素的偏差可能会毁掉它）
这些平均像素中的一个可以生成无限数量的图像，我的数据库只包含 800 个，所以它仍然可以工作（但是我不得不从压缩到 10x10 再到 100x100 才能最终没有克隆）

【讨论】：

我想这为时已晚，但 Python 有一个 perceptual hash library。这些旨在产生匹配或相似的哈希，即使图像略有不同。