【问题标题】：Increase space between text lines in image增加图像中文本行之间的空间
【发布时间】：2020-01-11 09:35:24
【问题描述】：

我有一个单行间距文本段落的输入图像。我正在尝试实现类似行间距选项来增加/减少 Microsoft Word 中文本行之间的空间。当前图像在单个空格中，如何将文本转换为双空格？或者说.5空间？本质上，我正在尝试动态重组文本行之间的间距，最好使用可调整的参数。像这样的：

输入图像

想要的结果

我目前的尝试是这样的。我已经能够稍微增加间距，但文本细节似乎被侵蚀并且行之间存在随机噪声。

关于如何改进代码或任何更好的方法的任何想法？

import numpy as np 
import cv2

img = cv2.imread('text.png')
H, W = img.shape[:2]
grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
threshed = cv2.threshold(grey, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]

hist = cv2.reduce(threshed, 1, cv2.REDUCE_AVG).reshape(-1)
spacing = 2
delimeter = [y for y in range(H - 1) if hist[y] <= spacing < hist[y + 1]]
arr = []
y_prev, y_curr = 0, 0
for y in delimeter:
    y_prev = y_curr
    y_curr = y
    arr.append(threshed[y_prev:y_curr, 0:W])

arr.append(threshed[y_curr:H, 0:W])
space_array = np.zeros((10, W))
result = np.zeros((1, W))

for im in arr:
    v = np.concatenate((space_array, im), axis=0)
    result = np.concatenate((result, v), axis=0)

result = (255 - result).astype(np.uint8)
cv2.imshow('result', result)
cv2.waitKey()

【问题讨论】：

您可以水平模糊或使用带有水平内核的形态。门槛。然后得到应该对应于文本行的每个区域的轮廓。然后提取它们并在新的干净背景图像上以更大的间距写入它们。

标签： python opencv image-processing text computer-vision

【解决方案1】：

方法 #1：像素分析

获取二值图像。加载图像，转灰度，大津阈值
求和行像素。这个想法是可以使用行的像素和来确定它是对应于文本还是空白
创建新图像并添加额外的空白。我们遍历像素数组并添加额外的空白

二值图像

# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.png')
h, w = image.shape[:2]
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

现在我们遍历每一行并对白色像素求和以生成像素数组。我们可以分析从每行中所有像素的总和生成的一列数据，以确定哪些行对应于文本。等于0 的数据部分表示由空白组成的图像行。这是数据数组的可视化：

# Sum white pixels in each row
# Create blank space array and and final image 
pixels = np.sum(thresh, axis=1).tolist()
space = np.ones((2, w), dtype=np.uint8) * 255
result = np.zeros((1, w), dtype=np.uint8)

我们将数据转换为列表并遍历数据以构建最终图像。如果确定一行是空白，那么我们将一个空白数组连接到最终图像。通过调整空数组的大小，我们可以更改要添加到图像的空间量。

# Iterate through each row and add space if entire row is empty
# otherwise add original section of image to final image
for index, value in enumerate(pixels):
    if value == 0:
        result = np.concatenate((result, space), axis=0)
    row = gray[index:index+1, 0:w]
    result = np.concatenate((result, row), axis=0)

这是结果

代码

import cv2
import numpy as np 
import matplotlib.pyplot as plt
# import pandas as pd

# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.png')
h, w = image.shape[:2]
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Sum white pixels in each row
# Create blank space array and and final image 
pixels = np.sum(thresh, axis=1).tolist()
space = np.ones((1, w), dtype=np.uint8) * 255
result = np.zeros((0, w), dtype=np.uint8)

# Iterate through each row and add space if entire row is empty
# otherwise add original section of image to final image
for index, value in enumerate(pixels):
    if value == 0:
        result = np.concatenate((result, space), axis=0)
    row = gray[index:index+1, 0:w]
    result = np.concatenate((result, row), axis=0)

# Uncomment for plot visualization
'''
x = range(len(pixels))[::-1]
df = pd.DataFrame({'y': x, 'x': pixels})
df.plot(x='x', y='y', xlim=(-2000,max(pixels) + 2000), legend=None, color='teal')
'''
cv2.imshow('result', result)
cv2.imshow('thresh', thresh)
plt.show()
cv2.waitKey()

方法#2：单行提取

对于更动态的方法，我们可以找到每条线的轮廓，然后在每个轮廓之间添加空间。我们使用与第一种方法相同的附加额外空白的方法。

获取二值图像。加载图像、灰度、高斯模糊、大津阈值
连接文本轮廓。我们创建一个水平形状的内核并扩张以将每行的单词连接成一个轮廓
提取每条线的轮廓。我们找到轮廓，使用imtuils.contours.sort_contours()从上到下排序并提取每条线的ROI
在每条线之间添加空白。我们创建一个空数组并通过在每条线轮廓之间添加空白来构建新图像

二值图像

# Load image, grayscale, blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
invert = 255 - thresh  
height, width = image.shape[:2]

创建水平内核并扩张

# Dilate with a horizontal kernel to connect text contours
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10,2))
dilate = cv2.dilate(thresh, kernel, iterations=2)

提取的单个线条轮廓以绿色突出显示

# Extract each line contour
lines = []
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    cv2.rectangle(image, (0, y), (width, y+h), (36,255,12), 2)
    line = original[y:y+h, 0:width]
    line = cv2.cvtColor(line, cv2.COLOR_BGR2GRAY)
    lines.append(line)

在每行之间添加空格。这是1 像素宽空间数组的结果

5 像素宽空间数组的结果

# Append white space in between each line
space = np.ones((1, width), dtype=np.uint8) * 255
result = np.zeros((0, width), dtype=np.uint8)
result = np.concatenate((result, space), axis=0)
for line in lines:
    result = np.concatenate((result, line), axis=0)
    result = np.concatenate((result, space), axis=0)

完整代码

import cv2
import numpy as np 
from imutils import contours

# Load image, grayscale, blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
invert = 255 - thresh  
height, width = image.shape[:2]

# Dilate with a horizontal kernel to connect text contours
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10,2))
dilate = cv2.dilate(thresh, kernel, iterations=2)

# Extract each line contour
lines = []
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    cv2.rectangle(image, (0, y), (width, y+h), (36,255,12), 2)
    line = original[y:y+h, 0:width]
    line = cv2.cvtColor(line, cv2.COLOR_BGR2GRAY)
    lines.append(line)

# Append white space in between each line
space = np.ones((1, width), dtype=np.uint8) * 255
result = np.zeros((0, width), dtype=np.uint8)
result = np.concatenate((result, space), axis=0)
for line in lines:
    result = np.concatenate((result, line), axis=0)
    result = np.concatenate((result, space), axis=0)

cv2.imshow('result', result)
cv2.imshow('image', image)
cv2.imshow('dilate', dilate)
cv2.waitKey()

【讨论】：

一个微不足道的误称——你所说的直方图并不是真正的直方图。它是从每行中所有像素的总和生成的一列数据的配置文件。（总和类似于将所有列平均为一列，即块调整大小）。直方图是概率分布；它显示计数与可能值。您正在计算列中的计数与位置。然而，优秀的解决方案！
谢谢，你说得对，它不是直方图。它是像素行数据的总和。我已经更新了帖子
很好的解释。！！

方法 #1： 像素分析

方法#2：单行提取

方法 #1：像素分析