Python字符串追加答案

【问题标题】：Python string appendPython字符串追加
【发布时间】：2011-12-24 19:01:53
【问题描述】：

我有一个 python 方法，它采用 (string, float) 形式的元组列表和返回一个字符串列表，如果组合在一起，则不会超过某个限制。我不是为了保留输出长度而拆分句子，而是确保保持在所需输出长度的句子长度内。

例如：
s：[('Where are you',1),('What about the next day',2),('When is the next event',3)]

最大长度：5
输出：'Where are you What about the next day'

最大长度：3
输出：'Where are you'

这就是我正在做的：

l=0
output = []
for s in s_tuples:
   if l <= max_length:
     output.append(s[0])
     l+=len(get_words_from(s[0]))
 return ''.join(output)

除了在达到长度时停止之外，有没有更聪明的方法来确保输出字长不超过 max_length？

【问题讨论】：

我不明白 max_length 5 的输出。长度为 5 的“下一个事件何时是”不是吗？编辑：好的，我知道了。
@atlantis：您的变量名“max_length”和您的“不会超过某个限制”和您的“确保输出字长不超过 max_length”与您在 cmets 中所说的相矛盾。请编辑您的问题，使其与您真正想做的一致。
那么，您是否正在寻找至少包含给定字数的最短字符串集？这就是您的示例似乎正在做的事情。另外，对中的数字有什么意义？我们必须先选择第一个字符串吗？
@10100: "好像是" != "是"

标签： python string append

【解决方案1】：

一种更聪明的方法是在超过max_length 后立即跳出循环，这样您就不会无缘无故地遍历列表的其余部分：

for s in s_tuples:
    if l > max_length:
        break
    output.append(s[0])
    l += len(get_words_from(s[0]))
return ''.join(output)

【讨论】：

【解决方案2】：

首先，如果达到最大长度，我认为没有理由将循环中断推迟到下一次迭代。

所以，修改你的代码，我想出了以下代码：

s_tuples = [('Where are you',1),('What about the next day',2),('When is the next event',3)]


def get_words_number(s):
    return len(s.split())


def truncate(s_tuples, max_length):
    tot_len = 0
    output = []
    for s in s_tuples:
        output.append(s[0])
        tot_len += get_words_number(s[0])
        if tot_len >= max_length:
            break
    return ' '.join(output)


print truncate(s_tuples,3)

其次，我真的不喜欢创建一个临时对象output。我们可以使用迭代器向join 方法提供迭代器，该迭代器在不复制信息的情况下迭代初始列表。

def truncate(s_tuples, max_length):

    def stop_iterator(s_tuples):
        tot_len = 0
        for s,num in s_tuples:
            yield s
            tot_len += get_words_number(s)
            if tot_len >= max_length:
                break

    return ' '.join(stop_iterator(s_tuples))


print truncate(s_tuples,3)

此外，在您的示例中，输出略大于设置的最大单词数。如果您希望字数始终小于限制（但仍然是可能的最大值），则不要在检查限制后输入yield：

def truncate(s_tuples, max_length):

    def stop_iterator(s_tuples):
        tot_len = 0
        for s,num in s_tuples:
            tot_len += get_words_number(s)
            if tot_len >= max_length:
                if tot_len == max_length:
                    yield s
                break
            yield s

    return ' '.join(stop_iterator(s_tuples))


print truncate(s_tuples,5)

【讨论】：

在您的最终 sn-p 中，您永远不会准确地获得 max_length。
@JohnMachin 你是对的。我会编辑答案。检查“角落”案例的解决方案通常是一个好习惯（我在这里没有做）。
现在重新阅读答案的第一句话，并将其应用于最终的 sn-p。
@JohnMachin 抱歉，它已经实现了。 OP 的问题是他计算了l+=... 并且在当前迭代中没有做任何事情，它的使用被推迟到下一次迭代，他将它与max_length 进行比较。在我的代码中，这些操作结合在一起。
对不起*2，但 l+=... 无关紧要。问题是，在你的最终 sn-p 中，当 tot_len == max_length 时，它不会在屈服后中断，它会再次运行（如果输入没有耗尽）并且无用地计算下一个项目的长度。这种行为称为“如果达到最大长度，则将循环中断推迟到下一次迭代”

【解决方案3】：

如果 NumPy 可用，则使用列表理解的以下解决方案有效。

import numpy as np

# Get the index of the last clause to append.
s_cumlen = np.cumsum([len(s[0].split()) for s in s_tuples])
append_until = np.sum(s_cumlen < max_length)

return ' '.join([s[0] for s in s_tuples[:append_until+1]])

为清楚起见：s_cumlen 包含字符串字数的累积总和。

>>> s_cumlen
array([ 3,  8, 13])

【讨论】：

【解决方案4】：

max_length 应该控制什么？返回列表中的单词总数？我原以为max_length 5 只能产生 5 个字，而不是 8 个。

编辑：我会保留两个列表，因为我认为它很容易阅读，但有些人可能不喜欢额外的开销：

def restrictWords(givenList, whenToStop):
    outputList = []
    wordList = []
    for pair in givenList:
        stringToCheck = pair[0]
        listOfWords = stringToCheck.split()
        for word in listOfWords:
            wordList.append(word)
        outputList.append( stringToCheck )
        if len( wordList ) >= whenToStop:
            break
    return outputList

所以

testList = [ ('one two three',1),
             ('four five',2),
             ('six seven eight nine',3) ]

2 应该给你['one two three'] 3 应该给你['one two three'] 4 应该给你['one two three', 'four five']

【讨论】：

是的，max_length 确实控制了输出中的字数，但并不精确。我不是为了保留输出长度而拆分句子，而是确保保持在所需输出长度的句子长度内。对于输出长度 5，我无法拆分第二句话。
当您发布答案时，应该是实际尝试回答问题。就像现在一样，它应该是一个评论。如果这只是某种占位符答案，您应该永远这样做。
@atlantis 好的，我明白了，所以它不是最大长度，但它表明哪个字符串将是你的最后一个。
@JeffMercado 好吧，我仍在继续输入我将要提供的答案，但我想看看在我完成它时是否可以从操作中获得更多信息。我还没有能力发表评论。我想我以后不应该回答了，对不起。
可以理解。既然您已经掌握了所需的信息，请尝试完成您的答案。否则最终会被删除。

【解决方案5】：

当达到限制时，您的代码不会停止。 “max_length”是一个坏名字......它不是“最大长度”，您的代码允许超过它（如您的第一个示例） - 这是故意的吗？ “l”是个坏名字；我们称之为tot_len。当 tot_len == max_length 时，您甚至可以继续前进。您的示例显示加入空格，但您的代码没有这样做。

你可能需要这样的东西：

tot_len = 0
output = []
for s in s_tuples:
    if tot_len >= max_length:
        break
    output.append(s[0])
    tot_len += len(get_words_from(s[0]))
return ' '.join(output)

【讨论】：