仅从列表中删除特殊字符答案

【问题标题】：only special characters remove from the list仅从列表中删除特殊字符
【发布时间】：2022-01-25 18:58:12
【问题描述】：

我从一个 pdf 文件中提取所有文本作为字符串，并通过删除所有双空格、换行符（两个或更多）、空格（如果两个或更多）以及每个点 ( .)。现在在我的列表中，如果列表的值仅包含特殊字符，则应排除该值。

pdfFileObj = open('Python String.pdf', 'rb') 
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pageObj = pdfReader.getPage(0)
text=pageObj.extractText()
z =re.split("\n+|[.]|\s{2,}",text)
while("" in z) :
    z.remove("")
print(z)

我的输出是

['split()', 'method in Python split a string into a list of strings after breaking the', 'given string by the specified separator', 'Syntax', ':', 'str', 'split(separator, maxsplit)', 'Parameters', ':', 'separator', ':', 'This is a delimiter', ' The string splits at this specified separator', ' If is', 'no', 't provided then any white space is a separator', 'maxsplit', ':', 'It is a number, which tells us to split the string into maximum of provi', 'ded number of times', ' If it is not provided then the default is', '-', '1 that means there', 'is no limit', 'Returns', ':', 'Returns a list of s', 'trings after breaking the given string by the specifie', 'd separator']

这里有一些只包含特殊字符的值，我想删除它们。谢谢

【问题讨论】：

标签： python regex list

【解决方案1】：

使用正则表达式来测试字符串是否包含任何字母或数字。

import re

z = [x for x in z if re.search(r'[a-z\d]', x, flags=re.I)]

在正则表达式中，a-z 匹配字母，\d 匹配数字，所以[a-z\d] 匹配任何字母或数字（re.I 标志使其不区分大小写）。因此列表推导包括z 中包含字母或数字的任何元素。

【讨论】：

【解决方案2】：

在将文本转换为列表之前删除这些特殊字符。删除 while("" in z) : z.remove("") 并在读取 text 变量后添加以下行：

text = re.sub('(a|b|c)', '', text)

在这个例子中，我的特殊字符是 a、b 和 c。

【讨论】：

问题不是如何去除特殊字符，而是如何去除列表中只包含特殊字符的元素。
也许你是对的。但是在标题和问题的末尾删除了提到的“特殊字符”。我也觉得我的回答是对的。有时一个问题有很多解决方案 :) 无论如何，您的答案是正确的并被接受为最佳答案。我没有异议。恭喜:)