Python：为什么 .strip() 不适用于整个文件？ [关闭]答案

【问题标题】：Python: Why doesn't .strip() work on an entire file? [closed]Python：为什么 .strip() 不适用于整个文件？ [关闭]
【发布时间】：2017-03-06 03:06:53
【问题描述】：

所以我有一个大文本文件（一本书），但我试图去除整个文本文件中的标点符号、特殊字符和空格，以便形成一个包含所有单词的字典。出于某种原因，当我使用 .strip() 方法时，它实际上什么也没做。

with open(filename, 'r') as file:
    entire = file.read()
    entire = entire.lower() #lower case the entire text (this works)
    entire = entire.strip(string.punctuations + string.digit) #this however does nothing

我如何去掉整本书的标点符号和数字，以便建立字典？

【问题讨论】：

因为它不应该那样做。为什么你认为它应该这样做？您不会找到任何声称此类内容的教程或文档。
刚开始用python编程，所以对我来说有点陌生，希望您能深入了解如何解决这个问题！干杯! :)
我投票决定将此问题作为离题结束，因为 SO proper 不是文档网站。
string.punctuations + string.digit 应该是 string.punctuation + string.digits （不是那条线会做你想做的事）

标签： python python-3.x file dictionary strip

【解决方案1】：

您可以使用str.translate() 删除字符：

import string

table = {ord(k) : None for k in string.punctuation + string.digits}
with open(filename, 'r') as f:
    entire = f.read().lower() #lower case the entire text (this works)
    entire = entire.translate(table)

table 通过将字符映射到None 来指定要删除的字符。字典理解用于构造table。然后调用str.translate() 执行删除。

【讨论】：

我不知道 str.translate 在 Python 3 中没有 str.maketrans 也能工作，谢谢。
@Blender：确实如此，但我最初有一个错误：表格必须将 Unicode 序数映射到 None 才能生效。 str.maketrans() 这样做，或者您可以在字典理解中使用 ord()。
str.maketrans 就是这样做的。我猜str.translate 只是忽略了无效的键。

【解决方案2】：

str.strip 不会超出字符串的末端。例如：

>>> 'abcXYZabcXYZbca'.strip('abc')
'XYZabcXYZ'

您可以改为构建转换表并改用str.translate：

>>> import string
>>> table = str.maketrans({c: None for c in string.punctuation + string.digits})
>>> "Foo bar's baz, 123 abc".translate(table)
'Foo bars baz  abc'

【讨论】：