【问题标题】:Remove invalid symbols from a text in python从python中的文本中删除无效符号
【发布时间】:2013-11-20 20:05:38
【问题描述】:

我正在尝试从文本中删除无效符号。我有这个代码:

def parse_documentation(filename):
    filename=open(filename)
    invalidsymbols=["`","~","!", "@","#","$"]
    for lines in filename:
        print(lines)
        for word in lines:
        print(word)
            for letter in word:
                if invalidsymbols==letter:
                    print(letter)

首先我只是通过打印字母来测试它,然后我会添加代码来删除它(del())。我的无效符号比列表中的符号多,但它很多,所以我想检查使用只有 5 或 6 个。我遇到的问题是它不仅打印无效符号,而且打印文本中的所有字母。此外,由于某种原因,它也会在我的文本之前打印额外的字符。我该如何解决?

我使用的文字是:

he's a jolly good fellow#
I want pizza!
I'm driving to school$

【问题讨论】:

  • for 不是这样处理字符串的。
  • @IgnacioVazquez-Abrams 我应该如何访问每一行中的字母?
  • 也许你应该更仔细地检查一下for 在做什么。

标签: python string python-3.x


【解决方案1】:

您可以使用str.translate 一次性删除所有不需要的符号:

>>> txt = """he's a jolly good fellow#
... I want pizza!
... I'm driving to school$"""
>>> print txt.translate(None, "`~!@#$")
he's a jolly good fellow
I want pizza
I'm driving to school

所以你的代码可能是这样的

def parse_documentation(filename, invalid_symbols):
    symb_to_remove = ''.join(invalid_symbols)
    with open(filename, 'rb') as in_file:
        for line in in_file:
            safe_line = line.translate(None, symb_to_remove)
            <here comes code to do smthng with safe_line>

你会调用这个函数

parse_documentation(filename, ["`","~","!", "@","#","$"])

【讨论】:

    【解决方案2】:
    def parse_documentation(filename):
        filename=open(filename, "r") # open file
        lines = filename.read(); # read all the lines in the file to a list named as "lines"
        invalidsymbols=["`","~","!", "@","#","$"]
        for line in lines: # for each line in lines
            for x in invalidsymbols: # loop through the list of invalid symbols
                if x in line: # if the invalid symbols is in the line
                    print(line) # print out the line
                    print(x) # and also print out the invalid symbol you encountered in that line
                    print(line.replace(x, "")) # print out a line with invalid symbol removed
    

    怎么样?

    【讨论】:

    • 欢迎 :) 每次遇到无效符号时,您仍在打印该行(如果该符号出现多次)
    • 刚刚在代码中添加了另一行,您可以通过 string.replace(character, "") 删除字符串中不需要的字符
    【解决方案3】:

    JoeC 已经回答了,但我想补充一点,如果您的 invalid 符号在一行中出现不止一次,那么您最好执行以下操作

    def parse_documentation(filename):
        filelines = open(filename)
        invalidsymbols=["`","~","!", "@","#","$"]
        for line in filelines:
            print(lines)
            for symbol in invalidsymbols:
                if symbol in line:
                    print("Above line contains %s symbol"%symbol)
    

    关于替换符号,请参考JoeC's answer

    【讨论】:

      【解决方案4】:

      尝试使用 textcleaner 库来完成此任务。
      按照此链接获取主页和文档:https://pypi.org/project/textcleaner/
      调用 remove_symbols 函数,它将返回一个干净的文本。它只使用正则表达式。
      功能说明: https://yugantm.github.io/textcleaner/documentation.html#remove_symbols

      【讨论】:

        猜你喜欢
        • 2013-11-19
        • 2017-10-09
        • 1970-01-01
        • 2014-03-29
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-04-05
        相关资源
        最近更新 更多