【问题标题】:Why can't I replace new line separator?为什么我不能更换新的行分隔符?
【发布时间】:2019-12-29 14:56:27
【问题描述】:

我正在开发 Python 电报客户端,该客户端将消息从应用程序发送到我们的 API,并且我想排除一些单词。在这种情况下应该删除一些@logins 和#tag:

这是我的代码:

for w in app.config['EXCLUDED_WORDS']:
    if w in data:
        data = data.replace(w, '')

很简单,对吧?我得到的结果(很多新行):

我尝试了非常不同的 NL 分隔符,例如 #YoCrypto\n #YoCrypto\r #YoCrypto\r\n,但没有奏效。所以这是我的print(data.encode('utf-8')) 输出:

#TAG\n#YoCrypto\xd0\xa0laced \xd0\xb0dditional signal for Bitmex. I will remember to include both exchanges on the same signal for btcusd now on. My apologies for inconvenience.\xef\xbb\xbf@grandcchat\n@grandcsign\n@grandcmargin

我做错了什么?

UPD 01.01.2020 有一些排除词:['@grandcmargin\n', '@grandcsign\n', '@grandcchat\n', '#YoCrypto\n', 'По всем вопросам (For all questions, please contact): @NickolchenkoGCS']

我们应该在替换区域的开始和结束时留下一个中断,所以预期的输出应该是这样的:

#TAG\n\nPlaced additional signal for Bitmex. I will remember to include both exchanges on the same signal for btcusd now on. My apologies for inconvenience.\n\n[Picture from message]

【问题讨论】:

  • ..和 WAIDW 代表什么?
  • @ZF007,我做错了什么
  • .. 那是我猜

标签: python-3.x replace utf-8 newline telegram


【解决方案1】:

一种可能的解决方案是使用re 模块并将单词加上任何其他换行符替换为空字符串。例如:

import re

data = b'''#TAG\n#YoCrypto\xd0\xa0laced \xd0\xb0dditional signal for Bitmex. I will remember to include both exchanges on the same signal for btcusd now on. My apologies for inconvenience.\xef\xbb\xbf@grandcchat\n@grandcsign\n@grandcmargin'''

words_to_remove = {'@grandcmargin', '@grandcsign', '@grandcchat', '#YoCrypto', 'По всем вопросам (For all questions, please contact): @NickolchenkoGCS'}

# decode the data (if not decoded already)
data = data.decode('utf-8')

# replace the words plus any aditional new-line character afterwards:
data = re.sub('|'.join(r'(?:[\ufeff]*{}\n*)'.format(re.escape(w)) for w in words_to_remove) , '\n', data)
data = re.sub(r'\n{3,}', r'\n\n', data) # remove excessive new-lines

print(data)

打印:

#TAG

Рlaced аdditional signal for Bitmex. I will remember to include both exchanges on the same signal for btcusd now on. My apologies for inconvenience.

【讨论】:

  • 没那么简单:我们需要删除一些空格和NL,但保留所有其余的
  • @marperia 你能编辑你的问题并在那里输入一些输入和预期输出吗?
  • 够了吗?我添加了一些
最近更新 更多