[注意:此贪心算法不保证最短解]
通过记住所有先前出现的字符,可以直接找到重复字符串的第一次出现(包括所有重复的最小结束索引 = 所有重复后的最大剩余字符串)并将其替换为 RLE(Python3 代码):
def singleRLE_v1(s):
occ = dict() # for each character remember all previous indices of occurrences
for idx,c in enumerate(s):
if not c in occ: occ[c] = []
for c_occ in occ[c]:
s_c = s[c_occ:idx]
i = 1
while s[idx+(i-1)*len(s_c) : idx+i*len(s_c)] == s_c:
i += 1
if i > 1:
rle_pars = ('(',')') if len(s_c) > 1 else ('','')
rle = ('%d'%i) + rle_pars[0] + s_c + rle_pars[1]
s_RLE = s[:c_occ] + rle + s[idx+(i-1)*len(s_c):]
return s_RLE
occ[c].append(idx)
return s # no repeating substring found
为了使其对迭代应用程序具有鲁棒性,我们必须排除一些可能不应用 RLE 的情况(例如 '11' 或 '))'),我们还必须确保 RLE 不会使字符串变长(这可能发生在两个字符的子字符串中,在 'abab' 中出现两次):
def singleRLE(s):
"find first occurrence of a repeating substring and replace it with RLE"
occ = dict() # for each character remember all previous indices of occurrences
for idx,c in enumerate(s):
if idx>0 and s[idx-1] in '0123456789': continue # no RLE for e.g. '11' or other parts of previous inserted RLE
if c == ')': continue # no RLE for '))...)'
if not c in occ: occ[c] = []
for c_occ in occ[c]:
s_c = s[c_occ:idx]
i = 1
while s[idx+(i-1)*len(s_c) : idx+i*len(s_c)] == s_c:
i += 1
if i > 1:
print("found %d*'%s'" % (i,s_c))
rle_pars = ('(',')') if len(s_c) > 1 else ('','')
rle = ('%d'%i) + rle_pars[0] + s_c + rle_pars[1]
if len(rle) <= i*len(s_c): # in case of a tie prefer RLE
s_RLE = s[:c_occ] + rle + s[idx+(i-1)*len(s_c):]
return s_RLE
occ[c].append(idx)
return s # no repeating substring found
现在我们可以安全地在之前的输出中调用singleRLE,只要我们找到一个重复的字符串:
def iterativeRLE(s):
s_RLE = singleRLE(s)
while s != s_RLE:
print(s_RLE)
s, s_RLE = s_RLE, singleRLE(s_RLE)
return s_RLE
通过上面插入的print 语句,我们得到例如以下跟踪和结果:
>>> iterativeRLE('xyabcdefdefabcdefdef')
found 2*'def'
xyabc2(def)abcdefdef
found 2*'def'
xyabc2(def)abc2(def)
found 2*'abc2(def)'
xy2(abc2(def))
'xy2(abc2(def))'
但是这个贪心算法对于这个输入失败了:
>>> iterativeRLE('abaaabaaabaa')
found 3*'a'
ab3abaaabaa
found 3*'a'
ab3ab3abaa
found 2*'b3a'
a2(b3a)baa
found 2*'a'
a2(b3a)b2a
'a2(b3a)b2a'
而最短的解决方案之一是3(ab2a)。