【发布时间】:2015-09-07 08:05:52
【问题描述】:
我正在对《独立宣言》进行抽样并计算其中单词长度的频率。
文件中的示例文本:
"When in the Course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires
that they should declare the causes which impel them to the separation."
注意:字长不能包含任何标点符号,例如string.punctuation 中的任何内容。
预期结果(样本):
Length Count
1 16
2 267
3 267
4 169
5 140
6 112
7 99
8 68
9 61
10 56
11 35
12 13
13 9
14 7
15 2
我目前正坚持从已转换为列表的文件中删除标点符号。
这是我迄今为止尝试过的:
import sys
import string
def format_text(fname):
punc = set(string.punctuation)
words = fname.read().split()
return ''.join(word for word in words if word not in punc)
try:
with open(sys.argv[1], 'r') as file_arg:
file_arg.read()
except IndexError:
print('You need to provide a filename as an arguement.')
sys.exit()
fname = open(sys.argv[1], 'r')
formatted_text = format_text(fname)
print(formatted_text)
【问题讨论】:
-
究竟是什么问题?