【问题标题】:Python split text on several partsPython 在几个部分上拆分文本
【发布时间】:2021-12-24 07:40:53
【问题描述】:

我从一个函数返回纯多行文本,我应该在 Telegram 或 Discord 中打印。问题是一条消息的字符限制。并且文本只能以行分隔。例如

limit = 50

text = "Line1 - some text 
Line2  - some text
Line3 - some text, limit here
Line4 - some text"

我需要做某事才能得到

text1 = "Line1 - some text 
Line2  - some text"

text2 = "Line3 - some text, limit here
Line4 - some text"

或任何其他将长字符串分成几个部分的方法,但只能按行。

这是错误的结果:

text1 = "Line1 - some text 
Line2  - some text 
Line3 - some"

text2 = "text, limit here
Line4 - some text"

【问题讨论】:

  • 如果一行超过字符限制会怎样?
  • 听起来您可能想使用正则表达式来打破单词边界
  • 就我而言,最大字符串长度不超过 68 个字符。但有时消息的全长超过 2000 个字符?超过 Telegram 或 Discord 的限制。

标签: python string string-length


【解决方案1】:

将数据拆分到缓冲区的简单示例

import re

limit = 50
text = "Line1 - some text\nLine2  - some text\nLine3 - some text, limit here\nLine4 - some text"
tring_array=re.split('(\n)(\r\n)',text)

message=""
for current_str in string_array:
    if (len(message)+len(current_str)+1) <= limit:
        message+=(current_str+'\n')
    else:
        if len(message) == 0:
            print "buffer to smal or empty string"
            break
        else:
            print "Message: %sSize: %d" % (message,len(message))
            message=current_str+'\n'

if len(message)>0:
    print "Message: %sSize: %d" % (message,len(message))

结果

Message: Line1 - some text
Line2  - some text
Size: 37
Message: Line3 - some text, limit here
Line4 - some text
Size: 48

【讨论】:

  • 将变量命名为与内置函数相同的名称是不好的做法。这可能会导致难以调试错误。
  • 谢谢把它改成常规变量
【解决方案2】:

一个简单的解决方案是这样的

def send(x):
    #put your sending code here
    print(x)

s = "10\n1\n101\n10\n1" #example input

s= s.split("\n") # divides the string into lines
print(s)
#we want to send as many lines as possible without the total size of the sent string being over limit
limit = 3 #make this whatever you want
sending = ""
total = 0

for line in s:
    if total + len(line) > limit:
        send(sending[:-1])
        total = len(line)
        sending = line + "\n"
    else:
        total += len(line)
        sending += line + "\n"
#need to send the final string; there is probably a better way to do this, especially because this will break if the first if is entered on the last iteration
send(sending[:-1])

我怀疑有一种更好的方法可以通过一些巧妙的拆分或正则表达式在几行中执行此操作,但这是一种按行将其拆分为更小的消息的粗暴方式。请注意,这将尝试发送超过字符限制的行,并且肯定可以改进。

【讨论】: