【发布时间】:2018-08-23 08:44:03
【问题描述】:
我一直在关注聊天机器人教程,但被卡住了。我已将我所在的确切步骤作为链接包含在这篇文章的底部,以防你好奇我的代码是什么样的(我很沮丧,所以我逐字复制了他的代码)。
在执行我的代码期间,它在抛出异常之前处理了超过 26,000 行。我的代码可以在下面找到。如您所见,我尝试了各种解决方案,包括将 /r 和 /n 字符替换为空,并添加标签 strict=False,这应该允许未终止的字符串进入 json,但这也不起作用。
with open('C:/Python34/stuff/chatbot/{}/RC_{}'.format(timeframe.split('-')[0], timeframe), buffering=1000) as f:
for row in f:
row_counter += 1
if row_counter > start_row:
try:
row = json.loads(row.replace('\n','').replace('\r',''), strict=False)
---------blah blah blah blah------------
except Exception as e:
print("RUH ROH " + str(e))
确切的错误信息如下:
RUH ROH Unterminated string starting at: line 1 column 368 (char 367)
链接: https://pythonprogramming.net/building-database-chatbot-deep-learning-python-tensorflow/
编辑:
在抛出错误时摆脱 try catch 给了我更多信息,可以在下面找到:
Traceback (most recent call last):
File "C:/Python34/stuff/chatbot/chatbot_db2.py", line 103, in <module>
row = json.loads(row.replace('\n','').replace('\r',''), strict=False)
File "C:\Python34\lib\json\__init__.py", line 331, in loads
return cls(**kw).decode(s)
File "C:\Python34\lib\json\decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python34\lib\json\decoder.py", line 359, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 368 (char 367)
EDIT2:
跟进评论,他们建议我打印出引发异常的行。它确实提供了一些启示。
{"subreddit":"sydney","author_flair_text":null,"id":"cqugtij","gilded":0,"removal_reason":null,"downs":0,"archived":false,"created_utc":"1430439358","link_id":"t3_34e5fd","ups":6,"subreddit_id":"t5_2qkob","name":"t1_cqugtij","score_hidden":false,"author_flair_css_class":null,"parent_id":"t1_cqttsc3","controversiality":0,"score":6,"author":"SilverMeteor9798","body":"As state transport minister almost every press release from Gladys had something in there about how the liberals were \"getting on with the job\" and blaming Labor for something. It wasn't necessarily false, it just got tiresome after a while particular
虽然成功的行看起来像这样:
{"created_utc":"1430438400","ups":4,"subreddit_id":"t5_378oi","link_id":"t3_34di91","name":"t1_cqug90g","score_hidden":false,"author_flair_css_class":null,"author_flair_text":null,"subreddit":"soccer_jp","id":"cqug90g","removal_reason":null,"gilded":0,"downs":0,"archived":false,"author":"rx109","score":4,"retrieved_on":1432703079,"body":"\u304f\u305d\n\u8aad\u307f\u305f\u3044\u304c\u8cb7\u3063\u305f\u3089\u8ca0\u3051\u306a\u6c17\u304c\u3059\u308b\n\u56f3\u66f8\u9928\u306b\u51fa\u306d\u30fc\u304b\u306a","distinguished":null,"edited":false,"controversiality":0,"parent_id":"t3_34di91"}
老实说,我现在更困惑了,但它看起来确实以所有对象的 "} 结尾。所以要么没有结束,要么有一个字符无法解析?
EDIT3 - 已解决
我认为文件是完整的,但我猜下载它时出错并且文件被截断,最后一个条目是不完整的 JSON 对象。因此,只需删除该条目即可解决问题。
感谢大家的帮助
【问题讨论】:
-
except ... print(row.replace('\n','').replace('\r',''))怎么样?这应该可以让您了解是什么让您失望。 -
您有输入失败的信息吗?
-
JSON 文档有 20000 行长?好吧,您显然不想在这里发布。如果您可以将其剥离到足够小的东西以产生相同的错误,那就太好了,但很有可能您不能。所以在 repo 或它来自的任何地方链接到它,或者至少告诉我们哪个生成的路径名有错误。另外:如果您可以直接在该文件上尝试独立的
json.load(在 REPL 或单行脚本中)并验证您是否遇到相同的错误,那将有所帮助。 -
啊,截断文件要简单得多。所以现在你要弄清楚如何编写错误处理,这样下次你得到一个不完整的文件时,调试就不会那么痛苦了。 :)
-
在回答您之前的问题时,在数据混乱以丢弃坏数据行的情况下,这是很常见的。您可以在整行周围放置一个异常处理程序。通常,当人们在生产代码中这样做时,他们会记录错误的数据行,以供人类查看以确定是存在错误还是只是错误的数据。