Python截断一个长字符串答案

【问题标题】：Python truncate a long stringPython截断一个长字符串
【发布时间】：2010-05-20 09:37:03
【问题描述】：

如何在 Python 中将字符串截断为 75 个字符？

这是在 JavaScript 中的实现方式：

var data="saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
var info = (data.length > 75) ? data.substring[0,75] + '..' : data;

【问题讨论】：

标签： python

【解决方案1】：

info = (data[:75] + '..') if len(data) > 75 else data

【讨论】：

我可能会将条件更改为 len(data) > 77 以解释双点（仅截断最后一个字符只是用点替换它是没有意义的）。
@hasenj：这不符合原始代码，但这是一个很好的建议，我应该首先指出。
请注意，包含的括号当然是可选的。
@TaylorEdmiston 没错，但对于那些不记得他们每天使用的 5-10 种语言的所有优先规则的人来说，它们非常有帮助。
@Anthony 一片

【解决方案2】：

更短：

info = data[:75] + (data[75:] and '..')

【讨论】：

有趣的做法。虽然它仍然是一个复合单线。 ^^
如果您包含“..”，这个解决方案不是有 77 个字符吗？
这不是在执行两个切片操作吗？我想知道与 stackoverflow.com/a/52279347/1834057 相比，当性能至关重要时，它的性能如何
当然，不错的原始答案，但马塞洛的答案更好，因为它更明确，因此更易读（因此是 Pythonic）。

【解决方案3】：

更简洁：

data = data[:75]

如果少于 75 个字符，则不会更改。

【讨论】：

如果字符串被截断，他可能想附加一个省略号。
你是对的——我从来没有注意到这一点。我想不出比其他答案更好的方法。

【解决方案4】：

如果您使用的是 Python 3.4+，则可以使用标准库中的 textwrap.shorten：

折叠并截断给定的文本以适应给定的宽度。

首先折叠文本中的空格（所有空格都被替换通过单个空格）。如果结果符合宽度，则返回。否则，从末尾删除足够多的单词，以便剩余的单词单词加上占位符适合宽度：
>>> textwrap.shorten("Hello  world!", width=12)
'Hello world!'
>>> textwrap.shorten("Hello  world!", width=11)
'Hello [...]'
>>> textwrap.shorten("Hello world", width=10, placeholder="...")
'Hello...'

【讨论】：

它似乎在很长的字符串（没有空格）上拉扯裤子并且只输出省略号。
@elBradford（以及其他感兴趣的人）：这是因为shorten() 截断了单词，而不是单个字符。我搜索了，但似乎没有办法配置 shorten() 或 TextWrapper 实例来剪辑单个字符而不是单词。
它还有去除换行符的烦人副作用
这并不能解决 OP 的问题。它按单词截断，甚至删除空格。
硬包装（忽略空格）：def shorten(s, width, placeholder='[...]'): return s[:width] if len(s) <= width else s[:width-len(placeholder)] + placeholder

【解决方案5】：

对于 Django 解决方案（问题中未提及）：

from django.utils.text import Truncator
value = Truncator(value).chars(75)

查看 Truncator 的源代码以了解该问题： https://github.com/django/django/blob/master/django/utils/text.py#L66

关于 Django 的截断： Django HTML truncation

【讨论】：

这不必要地将低级逻辑耦合到 django。不推荐。

【解决方案6】：

使用正则表达式：

re.sub(r'^(.{75}).*$', '\g<1>...', data)

长字符串被截断：

>>> data="11111111112222222222333333333344444444445555555555666666666677777777778888888888"
>>> re.sub(r'^(.{75}).*$', '\g<1>...', data)
'111111111122222222223333333333444444444455555555556666666666777777777788888...'

较短的字符串永远不会被截断：

>>> data="11111111112222222222333333"
>>> re.sub(r'^(.{75}).*$', '\g<1>...', data)
'11111111112222222222333333'

这样，你也可以“剪掉”字符串的中间部分，在某些情况下会更好：

re.sub(r'^(.{5}).*(.{5})$', '\g<1>...\g<2>', data)

>>> data="11111111112222222222333333333344444444445555555555666666666677777777778888888888"
>>> re.sub(r'^(.{5}).*(.{5})$', '\g<1>...\g<2>', data)
'11111...88888'

【讨论】：

当你的字符串中有空格时这不起作用
为什么要在这么简单的情况下使用正则表达式？
它确实适用于空格。例如最后一个输出是：'111111111 222222222 333333333 444444444 55555555556666666666777777777 88888...'

【解决方案7】：

limit = 75
info = data[:limit] + '..' * (len(data) > limit)

【讨论】：

这是最优雅的解决方案。此外，我会将字符限制（在本例中为 75）提取到变量中以避免不一致。 limit = 75; info = data[:limit] + '..' * (len(data) > limit)

【解决方案8】：

这只是在：

n = 8
s = '123'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '12345678'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '123456789'     
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]
s = '123456789012345'
print  s[:n-3] + (s[n-3:], '...')[len(s) > n]

123
12345678
12345...
12345...

【讨论】：

所有先前的答案都忽略了考虑 OP 真正想要的东西——输出字符串不超过 75 个字符。感谢理解“不要照我说的做，做我想做的”编程原则。为了完整起见，您可以通过附加来修复 n 2 else s[:n]

【解决方案9】：

info = data[:min(len(data), 75)

【讨论】：

只有代码的答案通常被认为是低质量的。您能否为您的答案添加解释。
应该是 info = data[:min(len(data), 75)] 最后一个 ] 不见了

【解决方案10】：

此方法不使用任何 if:

data[:75] + bool(data[75:]) * '..'

【讨论】：

我写它只是为了表明它是可能的。这违背了 python 的可读性理念。与其他基于“if”的方法相比，它没有任何性能优势。我从不使用它，也不建议您也使用它。

【解决方案11】：

实际上，您不能像执行动态分配的 C 字符串那样“截断” Python 字符串。 Python 中的字符串是不可变的。您可以做的是按照其他答案中的描述对字符串进行切片，生成一个仅包含切片偏移量和步骤定义的字符的新字符串。在某些（非实际）情况下，这可能有点烦人，例如当您选择 Python 作为面试语言并且面试官要求您从字符串中删除重复字符时。呵呵。

【讨论】：

这个问题是由 JavaScript 而不是 C 语言引起的。字符串也是不可变的。
问题是“如何在 Python 中将字符串截断为 75 个字符？”。答案是“你不能”。 OP 认为 Javascript substring == truncate 是无关紧要的。此外，我的回答的重点是，使用的 Pythonic 习语是字符串“切片”，例如在 C 语言中，您可能会截断字符串。它仅通过使用几个指向现有字符串的指针来节省分配和重复。
我认为很明显 OP 并不意味着截断原始字符串。他/她显然想要与 JavaScript 相同的行为。但是，尽管如此，您的答案是非常正确的，并且可以帮助其他人理解在 Python 中字符串也是不可变的，并且您没有转换原始字符串。 (+1)

【解决方案12】：

info = data[:75] + ('..' if len(data) > 75 else '')

【讨论】：

【解决方案13】：

另一种解决方案。使用True 和False，您会在最后得到一些关于测试的反馈。

data = {True: data[:75] + '..', False: data}[len(data) > 75]

【讨论】：

【解决方案14】：

       >>> info = lambda data: len(data)>10 and data[:10]+'...' or data
       >>> info('sdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdf')
           'sdfsdfsdfs...'
       >>> info('sdfsdf')
           'sdfsdf'
       >>>

【讨论】：

请解释一下你的答案？
这个函数的类似例子 def info2(data): if len(data)>10: return data[:10]+'...' else: return data 无名设计的lambda指令函数式 ex = lambda x:x+1 def ex(x): return x+1
你为什么滥用lambda来定义一个命名函数？ Python 对此有更适合的def 语句。

【解决方案15】：

简单而简短的辅助函数：

def truncate_string(value, max_length=255, suffix='...'):
    string_value = str(value)
    string_truncated = string_value[:min(len(string_value), (max_length - len(suffix)))]
    suffix = (suffix if len(string_value) > max_length else '')
    return string_truncated+suffix

用法示例：

# Example 1 (default):

long_string = ""
for number in range(1, 1000): 
    long_string += str(number) + ','    

result = truncate_string(long_string)
print(result)


# Example 2 (custom length):

short_string = 'Hello world'
result = truncate_string(short_string, 8)
print(result) # > Hello... 


# Example 3 (not truncated):

short_string = 'Hello world'
result = truncate_string(short_string)
print(result) # > Hello world

【讨论】：

【解决方案16】：

参加聚会很晚，我想添加我的解决方案，以在字符级别修剪文本，同时正确处理空格。

def trim_string(s: str, limit: int, ellipsis='…') -> str:
    s = s.strip()
    if len(s) > limit:
        return s[:limit].strip() + ellipsis
    return s

简单，但它会确保hello world 和limit=6 不会导致丑陋的hello …，而是hello…。

它还会删除前导和尾随空格，但不会删除内部空格。如果您还想删除里面的空格，请查看this stackoverflow post

【讨论】：

【解决方案17】：

不需要正则表达式，但您确实希望在接受的答案中使用字符串格式而不是字符串连接。

这可能是将字符串data 截断为 75 个字符的最规范的 Pythonic 方法。

>>> data = "saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
>>> info = "{}..".format(data[:75]) if len(data) > 75 else data
>>> info
'111111111122222222223333333333444444444455555555556666666666777777777788888...'

【讨论】：

我发现你的saddddddd... 字符串变成111111... 很有趣:) 我知道这是一个复制粘贴错字，我同意你对正则表达式的看法。

【解决方案18】：

这是我作为新 String 类的一部分创建的函数...它允许添加后缀（如果字符串在修剪后是大小并且添加它足够长 - 尽管您不需要强制绝对大小）

我正在改变一些事情，所以有一些无用的逻辑成本（例如，如果 _truncate ...）不再需要并且顶部有回报...

但是，它仍然是一个很好的截断数据的功能......

##
## Truncate characters of a string after _len'nth char, if necessary... If _len is less than 0, don't truncate anything... Note: If you attach a suffix, and you enable absolute max length then the suffix length is subtracted from max length... Note: If the suffix length is longer than the output then no suffix is used...
##
## Usage: Where _text = 'Testing', _width = 4
##      _data = String.Truncate( _text, _width )                        == Test
##      _data = String.Truncate( _text, _width, '..', True )            == Te..
##
## Equivalent Alternates: Where _text = 'Testing', _width = 4
##      _data = String.SubStr( _text, 0, _width )                       == Test
##      _data = _text[  : _width ]                                      == Test
##      _data = ( _text )[  : _width ]                                  == Test
##
def Truncate( _text, _max_len = -1, _suffix = False, _absolute_max_len = True ):
    ## Length of the string we are considering for truncation
    _len            = len( _text )

    ## Whether or not we have to truncate
    _truncate       = ( False, True )[ _len > _max_len ]

    ## Note: If we don't need to truncate, there's no point in proceeding...
    if ( not _truncate ):
        return _text

    ## The suffix in string form
    _suffix_str     = ( '',  str( _suffix ) )[ _truncate and _suffix != False ]

    ## The suffix length
    _len_suffix     = len( _suffix_str )

    ## Whether or not we add the suffix
    _add_suffix     = ( False, True )[ _truncate and _suffix != False and _max_len > _len_suffix ]

    ## Suffix Offset
    _suffix_offset = _max_len - _len_suffix
    _suffix_offset  = ( _max_len, _suffix_offset )[ _add_suffix and _absolute_max_len != False and _suffix_offset > 0 ]

    ## The truncate point.... If not necessary, then length of string.. If necessary then the max length with or without subtracting the suffix length... Note: It may be easier ( less logic cost ) to simply add the suffix to the calculated point, then truncate - if point is negative then the suffix will be destroyed anyway.
    ## If we don't need to truncate, then the length is the length of the string.. If we do need to truncate, then the length depends on whether we add the suffix and offset the length of the suffix or not...
    _len_truncate   = ( _len, _max_len )[ _truncate ]
    _len_truncate   = ( _len_truncate, _max_len )[ _len_truncate <= _max_len ]

    ## If we add the suffix, add it... Suffix won't be added if the suffix is the same length as the text being output...
    if ( _add_suffix ):
        _text = _text[ 0 : _suffix_offset ] + _suffix_str + _text[ _suffix_offset: ]

    ## Return the text after truncating...
    return _text[ : _len_truncate ]

【讨论】：

每个参数和变量中的所有下划线是怎么回事？