从列表中删除一个 '\\n\\t\\t\\t'-元素答案

【问题标题】：Remove an '\\n\\t\\t\\t'-element from list从列表中删除一个 '\\n\\t\\t\\t'-元素
【发布时间】：2023-03-25 17:42:01
【问题描述】：

我得到了以下名为“电话号码”的列表。我努力删除包含 '\n\t\t\t' 和 '\n\t\t\t\t' 的元素。我尝试了“try and except”-methode 和 remove('\n\t\t\t\t') 但无法使其工作。有什么建议吗？

['(02271) 6 79', '70', '\n\t\t\t', '(02271) 6 79', '\n\t\t\t\t', '70 ', '\n\t\t\t', '\n\t\t\t', '(02181) 27 0', '\n\t\t\t\t', '3-0' , '\n\t\t\t', '\n\t\t\t', '(02181) 27 0', '\n\t\t\t\t', '3-0', '\n\t\t\t', '\n\t\t\t', '(02161) 24 19', '\n\t\t\t\t', '40', '\n \t\t\t', '\n\t\t\t', '(02161) 24 19', '\n\t\t\t\t', '40', '\n\t\ t\t', '\n\t\t\t', '(02131) 66 67', '\n\t\t\t\t', '10', '\n\t\t\t ', '\n\t\t\t', '(02131) 66 67', '\n\t\t\t\t', '10', '\n\t\t\t', ' \n\t\t\t', '(02103) 39 00', '\n\t\t\t\t', '93', '\n\t\t\t', '\n\ t\t\t', '(02103) 39 00', '\n\t\t\t\t', '93', '\n\t\t\t', '\n\t\t \t', '(02173) 2 04 7', '\n\t\t\t\t', '3-0', '\n\t\t\t', '\n\t\t \t', '(02173) 2 04 7', '\n\t\t\t\t', '3-0', '\n\t\t\t', '\n\t\t \t', '(02235) 9 23 04', '\n\t\t\t\t', '30', '\n\t\t\t', '\n\t\t\t ', '(02235) 9 23 04', '\n\t\t\t\t', '30', '\n\t\t\t', '\n\t\t\t', '\n\t\t\t\t', '(0221) 3 46 79 40', '\n\t\t\t', '\n\t\t\t', '\n\t \t\t\t', '(0221) 3 46 79 40', '\n\t\t\t', '\n\t\t\t', '(02232) 4 23', '\ n\t\t\t\t', ' 05 ', '\n\t\t\t', '\n\t\t\t', '(02232) 4 23', '\n\t\t\t\t', '05', ' \n\t\t\t', '\n\t\t\t', '(0157) 86 85 74', '\n\t\t\t\t', '43', '\n \t\t\t', '\n\t\t\t', '(0157) 86 85 74', '\n\t\t\t\t', '43', '\n\t \t\t', '\n\t\t\t', '(02181) 2 78 11', '\n\t\t\t\t', '47', '\n\t\t \t', '\n\t\t\t', '(02181) 2 78 11', '\n\t\t\t\t', '47', '\n\t\t\t ', '\n\t\t\t', '(02181) 47 49 0', '\n\t\t\t\t', '0-0', '\n\t\t\t ', '\n\t\t\t', '(02181) 47 49 0', '\n\t\t\t\t', '0-0', '\n\t\t\t ', '\n\t\t\t', '(02202) 1 88', '\n\t\t\t\t', '60', '\n\t\t\t', ' \n\t\t\t', '(02202) 1 88', '\n\t\t\t\t', '60', '\n\t\t\t', '\n\ t\t\t', '(0211) 23 80', '\n\t\t\t\t', '70', '\n\t\t\t', '\n\t\t \t', '(0211) 23 80', '\n\t\t\t\t', '70', '\n\t\t\t', '\n\t\t\t' , '(02235) 9 23 0', '\n\t\t\t\t', '4-0', '\n\t\t\t', '\n\t\t\t' , '(02235) 9 23 0', '\n\t\t\t\t', '4-0', '\n\t\t\t']

【问题讨论】：

发布您尝试过的内容，有人可能会帮助您解决问题。
也许你应该修改生成列表的代码而不是删除项目，而不是首先插入它们。这个列表是如何生成的？
@Bryan Oakley 首先使用 Qt 渲染页面，然后使用 lxml 通过 tree.xpath 提取列表： phonenumbers = tree.xpath('//span[@class="text nummer_ganz "]//text()') -- 网址是：gelbeseiten.de/schluesselfertigbau/bergheim,,,,,umkreis-50000/…
str.strip() 将删除所有'\n\t\t\t\t'，因此您可以使用[e for e in ur_lst if e.strip()] 过滤掉所有空白元素。不需要正则表达式。
@dawg：很好，尽管我什至会使用lst = [number for item in lst for number in [item.strip()] if number] 将项目剥离到结果列表中。已在下面更新了我的答案。

标签： python regex

【解决方案1】：

这样试试，

result = [i for i in lst if not i.endswith('\t\t')]

【讨论】：

【解决方案2】：

您可以使用list-comprehension 创建strings 列表，其中每个人都必须通过all string 中的字符(c) 是in 的测试：@987654328 @。我认为这是最有效的通用解决方案，适用于仅包含 tabs 和 newlines 的 strings，它在 Python 中也非常易读：

[i for i in lst if all(c not in '\t\n' for c in i)]

给出正确的结果：

['(02271) 6 79', ' 70', '(02271) 6 79', ' 70', '(02181) 27 0', '3-0', '(02181) 27 0', '3-0', '(02161) 24 19', ' 40', '(02161) 24 19', ' 40', '(02131) 66 67', ' 10', '(02131) 66 67', ' 10', '(02103) 39 00', ' 93', '(02103) 39 00', ' 93', '(02173) 2 04 7', '3-0', '(02173) 2 04 7', '3-0', '(02235) 9 23 04', ' 30', '(02235) 9 23 04', ' 30', '(0221) 3 46 79 40', '(0221) 3 46 79 40', '(02232) 4 23', ' 05', '(02232) 4 23', ' 05', '(0157) 86 85 74', ' 43', '(0157) 86 85 74', ' 43', '(02181) 2 78 11', ' 47', '(02181) 2 78 11', ' 47', '(02181) 47 49 0', '0-0', '(02181) 47 49 0', '0-0', '(02202) 1 88', ' 60', '(02202) 1 88', ' 60', '(0211) 23 80', ' 70', '(0211) 23 80', ' 70', '(02235) 9 23 0', '4-0', '(02235) 9 23 0', '4-0']

您也可以使用更短的str.isspace()，但在检查所有whitespace时可能会（我不是100%肯定）稍微慢字符：

[i for i in lst if not i.isspace()]

给出相同的结果。

【讨论】：

【解决方案3】：

你可以用一个简单的表达式，比如

^\s+$

在Python:

import re

lst = ['(02271) 6 79', ' 70', '\n\t\t\t', '(02271) 6 79', '\n\t\t\t\t', ' 70', '\n\t\t\t', '\n\t\t\t', '(02181) 27 0', '\n\t\t\t\t', '3-0', '\n\t\t\t', '\n\t\t\t', '(02181) 27 0', '\n\t\t\t\t', '3-0', '\n\t\t\t', '\n\t\t\t', '(02161) 24 19', '\n\t\t\t\t', ' 40', '\n\t\t\t', '\n\t\t\t', '(02161) 24 19', '\n\t\t\t\t', ' 40', '\n\t\t\t', '\n\t\t\t', '(02131) 66 67', '\n\t\t\t\t', ' 10', '\n\t\t\t', '\n\t\t\t', '(02131) 66 67', '\n\t\t\t\t', ' 10', '\n\t\t\t', '\n\t\t\t', '(02103) 39 00', '\n\t\t\t\t', ' 93', '\n\t\t\t', '\n\t\t\t', '(02103) 39 00', '\n\t\t\t\t', ' 93', '\n\t\t\t', '\n\t\t\t', '(02173) 2 04 7', '\n\t\t\t\t', '3-0', '\n\t\t\t', '\n\t\t\t', '(02173) 2 04 7', '\n\t\t\t\t', '3-0', '\n\t\t\t', '\n\t\t\t', '(02235) 9 23 04', '\n\t\t\t\t', ' 30', '\n\t\t\t', '\n\t\t\t', '(02235) 9 23 04', '\n\t\t\t\t', ' 30', '\n\t\t\t', '\n\t\t\t', '\n\t\t\t\t', '(0221) 3 46 79 40', '\n\t\t\t', '\n\t\t\t', '\n\t\t\t\t', '(0221) 3 46 79 40', '\n\t\t\t', '\n\t\t\t', '(02232) 4 23', '\n\t\t\t\t', ' 05', '\n\t\t\t', '\n\t\t\t', '(02232) 4 23', '\n\t\t\t\t', ' 05', '\n\t\t\t', '\n\t\t\t', '(0157) 86 85 74', '\n\t\t\t\t', ' 43', '\n\t\t\t', '\n\t\t\t', '(0157) 86 85 74', '\n\t\t\t\t', ' 43', '\n\t\t\t', '\n\t\t\t', '(02181) 2 78 11', '\n\t\t\t\t', ' 47', '\n\t\t\t', '\n\t\t\t', '(02181) 2 78 11', '\n\t\t\t\t', ' 47', '\n\t\t\t', '\n\t\t\t', '(02181) 47 49 0', '\n\t\t\t\t', '0-0', '\n\t\t\t', '\n\t\t\t', '(02181) 47 49 0', '\n\t\t\t\t', '0-0', '\n\t\t\t', '\n\t\t\t', '(02202) 1 88', '\n\t\t\t\t', ' 60', '\n\t\t\t', '\n\t\t\t', '(02202) 1 88', '\n\t\t\t\t', ' 60', '\n\t\t\t', '\n\t\t\t', '(0211) 23 80', '\n\t\t\t\t', ' 70', '\n\t\t\t', '\n\t\t\t', '(0211) 23 80', '\n\t\t\t\t', ' 70', '\n\t\t\t', '\n\t\t\t', '(02235) 9 23 0', '\n\t\t\t\t', '4-0', '\n\t\t\t', '\n\t\t\t', '(02235) 9 23 0', '\n\t\t\t\t', '4-0', '\n\t\t\t']

rx = re.compile(r'^\s+$')

lst = [item.strip() for item in lst if not rx.match(item)]
print(lst)

这会产生并去除从头到尾不仅是空格的所有数字：

['(02271) 6 79', '70', '(02271) 6 79', '70', '(02181) 27 0', '3-0', '(02181) 27 0', '3-0', '(02161) 24 19', '40', '(02161) 24 19', '40', '(02131) 66 67', '10', '(02131) 66 67', '10', '(02103) 39 00', '93', '(02103) 39 00', '93', '(02173) 2 04 7', '3-0', '(02173) 2 04 7', '3-0', '(02235) 9 23 04', '30', '(02235) 9 23 04', '30', '(0221) 3 46 79 40', '(0221) 3 46 79 40', '(02232) 4 23', '05', '(02232) 4 23', '05', '(0157) 86 85 74', '43', '(0157) 86 85 74', '43', '(02181) 2 78 11', '47', '(02181) 2 78 11', '47', '(02181) 47 49 0', '0-0', '(02181) 47 49 0', '0-0', '(02202) 1 88', '60', '(02202) 1 88', '60', '(0211) 23 80', '70', '(0211) 23 80', '70', '(02235) 9 23 0', '4-0', '(02235) 9 23 0', '4-0']

正如@dawg 指出的那样，实际上并不需要正则表达式：

lst = [number for item in lst for number in [item.strip()] if number]

【讨论】：

感谢您的所有回答。我都试过了，但没有一个对我有用。可能我的列表不是“真实”列表？ @Jan 当我使用您定义的列表“lst”时，它可以工作。但是当我写 lst = phonenumbers 它没有...我的列表是通过首先使用 Qt 渲染页面然后使用 lxml 通过 tree.xpath 提取列表来创建的： phonenumbers = tree.xpath('//span[@class= "text nummer_ganz"]//text()') -- 网站是：gelbeseiten.de/schluesselfertigbau/bergheim,,,,,umkreis-5000‌0/s1