【问题标题】：Numpy PIL Python : crop image on whitespace or crop text with histogram ThresholdsNumpy PIL Python：在空白处裁剪图像或使用直方图阈值裁剪文本
【发布时间】：2014-09-01 11:46:24
【问题描述】：

我将如何找到下图中数字周围空白区域的边界框或窗口？：

原图：

高度：762 像素宽度：1014 像素

目标：

类似：{x-bound:[x-upper,x-lower], y-bound:[y-upper,y-lower]} 这样我就可以裁剪到文本并输入到 tesseract 或某些 OCR 中。

尝试：

我曾想过将图像切成硬编码的块大小并随机分析，但我认为这太慢了。

使用 pyplot 的示例代码改编自 (Using python and PIL how can I grab a block of text in an image?)：

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
im = Image.open('/home/jmunsch/Pictures/Aet62.png')
p = np.array(im)
p = p[:,:,0:3]
p = 255 - p
lx,ly,lz = p.shape

plt.plot(p.sum(axis=1))
plt.plot(p.sum(axis=0))

#I was thinking something like this 
#The image is a 3-dimensional ndarray  [[x],[y],[color?]]
#Set each value below an axes mean to 0
[item = 0 for item in p[axis=0] if item < p.mean(axis=0)]

# and then some type of enumerated groupby for each axes
#finding the mean index for each groupby(0) on axes

plt.plot(p[mean_index1:mean_index2,mean_index3:mean_index4])

根据图表，每个山谷都将指示一个要绑定的地方。

第一张图显示了文本行的位置
第二张图显示了字符的位置

绘图示例`plt.plot(p.sum(axis=1))`:

绘制示例输出`plt.plot(p.sum(axis=0))`:

更新：HYRY 的解决方案

【问题讨论】：

“区域”是什么意思？您想要包含第一张图像中字母的矩形的坐标吗？它需要泛化到什么？
numpy 只做数组（以及一些基本的统计数据等）。听起来您需要像scikit-image 这样的计算机视觉库。尤其是如果您不想只为这张图片设置边界框（您可能会注意到）。
我很欣赏有关计算机视觉库的提示，但numpy 可以操作数组，然后可以使用PIL 将其转换回图像。举个例子：pix = np.array(im);cropped_to_corner = Image.fromarray(pix[0:200,0:200]) 刚刚想出了如何到达 x 轴
那你有什么问题吗？如果您可以改写它，或者说出您尝试过的内容，那么您要完成的工作可能会更清楚一些。我不知道您发布的代码与您的问题有何关系。
@machow 为这个令人困惑的问题道歉。读了几遍后，我试着改写它。

标签： python numpy matplotlib python-imaging-library

【解决方案1】：

我认为你可以在scipy.ndimage 中使用形态学函数，这是一个示例：

import pylab as pl
import numpy as np
from scipy import ndimage
img = pl.imread("Aet62.png")[:, :, 0].astype(np.uint8)
img2 = ndimage.binary_erosion(img, iterations=40)
img3 = ndimage.binary_dilation(img2, iterations=40)
labels, n = ndimage.label(img3)
counts = np.bincount(labels.ravel())
counts[0] = 0
img4 = labels==np.argmax(counts)
img5 = ndimage.binary_fill_holes(img4)
result = ~img & img5
result = ndimage.binary_erosion(result, iterations=3)
result = ndimage.binary_dilation(result, iterations=3)
pl.imshow(result, cmap="gray")

输出是：

【讨论】：

非常酷。我可能会将-result 用于tesseract。尝试使用直方图总体上只是一个坏主意吗？（例如 scipy-lectures.github.io/packages/scikit-image/… ）对于像我这样不太喜欢数学的人？

原图：

目标：

尝试：

绘图示例plt.plot(p.sum(axis=1)):

绘制示例输出plt.plot(p.sum(axis=0)):

更新：HYRY 的解决方案

绘图示例`plt.plot(p.sum(axis=1))`:

绘制示例输出`plt.plot(p.sum(axis=0))`: