Python - 无法将 txt 文件中的行拆分为单词答案

【问题标题】：Python - Unable to split lines from a txt file into wordsPython - 无法将 txt 文件中的行拆分为单词
【发布时间】：2013-11-19 19:25:30
【问题描述】：

我的目标是打开一个文件并将其拆分为唯一的单词并显示该列表（以及数字计数）。我想我必须将文件拆分为行，然后将这些行拆分为单词并将其全部添加到列表中。

问题是，如果我的程序将在无限循环中运行并且不显示任何结果，或者它只会读取一行然后停止。正在读取的文件是葛底斯堡地址。

def uniquify( splitz, uniqueWords, lineNum ):
for word in splitz:
    word = word.lower()        
    if word not in uniqueWords:
        uniqueWords.append( word )

def conjunctionFunction():

    uniqueWords = []

    with open(r'C:\Users\Alex\Desktop\Address.txt') as f :
        getty = [line.rstrip('\n') for line in f]
    lineNum = 0
    lines = getty[lineNum]
    getty.append("\n")
    while lineNum < 20 :
        splitz = lines.split()
        lineNum += 1

        uniquify( splitz, uniqueWords, lineNum )
    print( uniqueWords )


conjunctionFunction()

【问题讨论】：

您的缩进是否正确，或者您在此处创建问题时只是复制/粘贴问题？
为什么需要 lineNum 作为函数 uniquify 的参数？
我使用 lineNum 来引用文件中的每一行。对于 uniquify 函数，我尝试将 lineNum += 1 与 if 语句一起放入 uniquify 函数中。
你没有推进你的 while 循环，你只是继续粘贴同一行，直到计数器达到 20
@turbo 在意识到这个简单的错误后才修复它，谢谢！

标签： python list file-io split

【解决方案1】：

使用您当前的代码，行：

lines = getty[lineNum]

应该在while循环中移动。

【讨论】：

@user3010284 这是正确的答案，这就是我赞成它的原因，但你也应该看看我的答案。你让这个任务复杂化了。

【解决方案2】：

您发现您的代码有什么问题，但是，我会以稍微不同的方式执行此操作。由于您需要跟踪唯一单词的数量及其计数，因此您应该使用字典来完成此任务：

wordHash = {}

with open('C:\Users\Alex\Desktop\Address.txt', 'r') as f :
    for line in f:
       line = line.rstrip().lower()

       for word in line:
            if word not in wordHash:
                wordHash[word] = 1

            else: 
                wordHash[word] += 1

print wordHash

【讨论】：

【解决方案3】：

def splitData(filename):
    return [words for words in open(filename).reads().split()]

将文件拆分为单词的最简单方法:)

【讨论】：

【解决方案4】：

假设 inp 是从文件中检索到的

inp = """Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense."""


data = inp.splitlines()

print data

_d = {}

for line in data:
    word_lst = line.split()
    for word in word_lst:
        if word in _d:
            _d[word] += 1
        else:
            _d[word] = 1

print _d.keys()

输出

['Beautiful', 'Flat', 'Simple', 'is', 'dense.', 'Explicit', 'better', 'nested.', 'Complex', 'ugly.', 'Sparse', 'implicit.', 'complex.', 'than', 'complicated.']

【讨论】：

【解决方案5】：

我推荐：

#!/usr/local/cpython-3.3/bin/python

import pprint
import collections

def genwords(file_):
    for line in file_:
        for word in line.split():
            yield word

def main():
    with open('gettysburg.txt', 'r') as file_:
        result = collections.Counter(genwords(file_))

    pprint.pprint(result)

main()

...但是您可以使用 re.findall 来更好地处理标点符号，而不是 string.split。

【讨论】：