【发布时间】:2016-11-17 11:18:52
【问题描述】:
我正在尝试编写一个程序,输入用户所述的两个 txt 文件,获取关键字文件并将其拆分为单词和值,然后获取推文文件,将其拆分为位置和推文/时间.
关键字文件示例(单行距 .txt 文件):
*爱,10
喜欢,5
最好,10
讨厌,1
哈哈,10
更好,10*
推文文件示例(注意这里只显示了四个,实际的 .txt 文件中实际上有几百行):
[41.298669629999999, -81.915329330000006] 6 2011-08-28 19:02:36 工作需要飞速发展......我很高兴看到 Spy Kids 4 和我的生活一样热爱...... ARREIC
[33.702900329999999, -117.95095704000001] 6 2011-08-28 19:03:13 今天将是我一生中最伟大的一天。受雇在我最好朋友的祖父母 50 周年纪念日拍照。 60位老人。呜呜呜。
[38.809954939999997, -77.125144050000003] 6 2011-08-28 19:07:05 我只是把我的生命放在5个手提箱里
[27.994195699999999, -82.569434900000005] 6 2011-08-28 19:08:02 @Miss_mariiix3 是我一生的挚爱
到目前为止,我的程序看起来像:
#prompt the user for the file name of keywords file
keywordsinputfile = input("Please input file name: ")
tweetsinputfile = input ("Please input tweets file name: ")
#try to open given input file
try:
k=open(keywordsinputfile, "r")
except IOError:
print ("{} file not found".format(keywordsinputfile))
try:
t=open(tweetsinputfile, "r")
except IOError:
print ("{} file not found".format(tweetsinputfile))
exit()
def main (): #main function
kinputfile = open(keywordsinputfile, "r") #Opens File for keywords
tinputfile = open(tweetsinputfile, "r") #Opens file for tweets
HappyWords = {}
HappyValues = {}
for line in kinputfile: #splits keywords
entries = line.split(",")
hvwords = str(entries[0])
hvalues = int(entries[1])
HappyWords["keywords"] = hvwords #stores Happiness keywords
HappyValues["values"] = hvalues #stores Happiness Values
for line in tinputfile:
twoparts = line.split("]") #splits tweet file by ] creating a location and tweet parts, tweets are ignored for now
startlocation = (twoparts[0]) #takes the first part (the locations)
def testing(startlocation):
for line in startlocation:
intlocation = line.split("[") #then gets rid of the "[" at the beginning of the locations
print (intlocation)
testing(startlocation)
main()
我希望从中得到的是(对于无限多的行,实际文件包含的远远超过上面显示的四个)
41.298669629999999, -81.915329330000006
33.702900329999999, -117.95095704000001
38.809954939999997, -77.125144050000003
27.994195699999999, -82.569434900000005
我得到的是:
['', '']
['2']
['7']
['.']
['9']
['9']
['4']
['1']
['9']
['5']
['6']
['9']
['9']
['9']
['9']
['9']
['9']
['9']
['9']
[',']
[' ']
['-']
['8']
['2']
['.']
['5']
['6']
['9']
['4']
['3']
['4']
['9']
['0']
['0']
['0']
['0']
['0']
['0']
['0']
['5']
也就是说,它只处理 txt 文件的最后一行并将其单独拆分。
在此之后,我必须以这样一种方式存储它们,以便我可以将它们再次拆分为一个列表中的第一部分和另一个列表中的第二部分 (例如:
for line in locations:
entries = line.split(",")
latitude = intr(entries[0])
longitude = int(entries[1])
提前致谢!
【问题讨论】:
-
您在循环中一次又一次地覆盖
HappyWords["keywords"]和HappyValues["values"]。所以你只看到关键字文件的最后一行。 -
感谢您的评论,我已解决此问题,但我尝试提取的代码部分根本不使用这些值。仍然有同样的错误。
-
使用
print()查看所有变量中的内容 - 这样您就可以找到出错的地方。
标签: python python-3.x pycharm