python json.loads未终止的字符串错误答案

【问题标题】：python json.loads Unterminated string errorpython json.loads未终止的字符串错误
【发布时间】：2018-08-23 08:44:03
【问题描述】：

我一直在关注聊天机器人教程，但被卡住了。我已将我所在的确切步骤作为链接包含在这篇文章的底部，以防你好奇我的代码是什么样的（我很沮丧，所以我逐字复制了他的代码）。

在执行我的代码期间，它在抛出异常之前处理了超过 26,000 行。我的代码可以在下面找到。如您所见，我尝试了各种解决方案，包括将 /r 和 /n 字符替换为空，并添加标签 strict=False，这应该允许未终止的字符串进入 json，但这也不起作用。

with open('C:/Python34/stuff/chatbot/{}/RC_{}'.format(timeframe.split('-')[0], timeframe), buffering=1000) as f:
    for row in f:
        row_counter += 1

        if row_counter > start_row:
            try:
                row = json.loads(row.replace('\n','').replace('\r',''), strict=False)

            ---------blah blah blah blah------------ 

            except Exception as e:
                print("RUH ROH " + str(e))

确切的错误信息如下：

RUH ROH Unterminated string starting at: line 1 column 368 (char 367)

链接： https://pythonprogramming.net/building-database-chatbot-deep-learning-python-tensorflow/

编辑：

在抛出错误时摆脱 try catch 给了我更多信息，可以在下面找到：

Traceback (most recent call last):
  File "C:/Python34/stuff/chatbot/chatbot_db2.py", line 103, in <module>
row = json.loads(row.replace('\n','').replace('\r',''), strict=False)
  File "C:\Python34\lib\json\__init__.py", line 331, in loads
return cls(**kw).decode(s)
  File "C:\Python34\lib\json\decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python34\lib\json\decoder.py", line 359, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 368 (char 367)

EDIT2：

跟进评论，他们建议我打印出引发异常的行。它确实提供了一些启示。

{"subreddit":"sydney","author_flair_text":null,"id":"cqugtij","gilded":0,"removal_reason":null,"downs":0,"archived":false,"created_utc":"1430439358","link_id":"t3_34e5fd","ups":6,"subreddit_id":"t5_2qkob","name":"t1_cqugtij","score_hidden":false,"author_flair_css_class":null,"parent_id":"t1_cqttsc3","controversiality":0,"score":6,"author":"SilverMeteor9798","body":"As state transport minister almost every press release from Gladys had something in there about how the liberals were \"getting on with the job\" and blaming Labor for something. It wasn't necessarily false, it just got tiresome after a while particular

虽然成功的行看起来像这样：

{"created_utc":"1430438400","ups":4,"subreddit_id":"t5_378oi","link_id":"t3_34di91","name":"t1_cqug90g","score_hidden":false,"author_flair_css_class":null,"author_flair_text":null,"subreddit":"soccer_jp","id":"cqug90g","removal_reason":null,"gilded":0,"downs":0,"archived":false,"author":"rx109","score":4,"retrieved_on":1432703079,"body":"\u304f\u305d\n\u8aad\u307f\u305f\u3044\u304c\u8cb7\u3063\u305f\u3089\u8ca0\u3051\u306a\u6c17\u304c\u3059\u308b\n\u56f3\u66f8\u9928\u306b\u51fa\u306d\u30fc\u304b\u306a","distinguished":null,"edited":false,"controversiality":0,"parent_id":"t3_34di91"}

老实说，我现在更困惑了，但它看起来确实以所有对象的 "} 结尾。所以要么没有结束，要么有一个字符无法解析？

EDIT3 - 已解决

我认为文件是完整的，但我猜下载它时出错并且文件被截断，最后一个条目是不完整的 JSON 对象。因此，只需删除该条目即可解决问题。

感谢大家的帮助

【问题讨论】：

except ... print(row.replace('\n','').replace('\r','')) 怎么样？这应该可以让您了解是什么让您失望。
您有输入失败的信息吗？
JSON 文档有 20000 行长？好吧，您显然不想在这里发布。如果您可以将其剥离到足够小的东西以产生相同的错误，那就太好了，但很有可能您不能。所以在 repo 或它来自的任何地方链接到它，或者至少告诉我们哪个生成的路径名有错误。另外：如果您可以直接在该文件上尝试独立的json.load（在 REPL 或单行脚本中）并验证您是否遇到相同的错误，那将有所帮助。
啊，截断文件要简单得多。所以现在你要弄清楚如何编写错误处理，这样下次你得到一个不完整的文件时，调试就不会那么痛苦了。 :)
在回答您之前的问题时，在数据混乱以丢弃坏数据行的情况下，这是很常见的。您可以在整行周围放置一个异常处理程序。通常，当人们在生产代码中这样做时，他们会记录错误的数据行，以供人类查看以确定是存在错误还是只是错误的数据。

标签： python json sqlite

【解决方案1】：

正如我在 EDIT2 中解释的那样，我打印出了给我带来麻烦的行，并看到它没有以每个 JSON 对象应该的 } 结尾。然后我进入文件，通过简单的搜索检查了给我带来麻烦的确切行，我发现该行不仅被截断，而且也是我文件的最后一行。

当我下载或解压这个文件时，肯定有一个错误，它似乎缩短了它。这反过来又引发了我得到的错误，似乎没有解决方案。

对于遇到此错误且 .replace() 解决方案不起作用的任何人：尝试查看您的数据并确保确实有一些东西可以替换或编辑。就我而言，在下载或提取过程中出现了截断错误，这使得此类解决方案变得不可能。

非常感谢 abarnert、Michael Robellard 和 Anton Kachurin

【讨论】：

【解决方案2】：

我发现Luminoso 的好人写了Library 来解决这类问题。

显然，有时您可能不得不处理来自其他代码的文本。文本经常通过几个不同的软件，每个都有自己的怪癖，可能与 Microsoft Office 在链中的某个地方 --- see this blog post

这就是ftfy 来救援的地方。

from ftfy import fix_text
import json
# text = some text source with a potential unicode problem
fixed_text = fix_text(text)
data = json.loads(fixed_text)

【讨论】：