如何在不使用转储的情况下在 python 中编写 json 文件答案

【问题标题】：how to write a json file in python without using dumps如何在不使用转储的情况下在 python 中编写 json 文件
【发布时间】：2018-10-10 17:00:41
【问题描述】：

我有以下来自 MongoDB 的 bson 数据。我必须将代码转换为有效的 json 才能创建 PySpark DataFrame。

"\"{u'_raja': ObjectId('XXXXXX'),\\n u'ram': datetime.datetime(XXx,xx14, xx, xx, xxx),\\n u'createUserId': u'praja-policy',\\n u'raja': u'I5',\\n u'udatedTime': datetime.datetime(XXx, xx, xx, xx, xx, xx, xxxx),\\n u'lastupdatedid': u'raja_id',\\n u'plt': u'123r32'}\""

我已经编写了以下代码。

from bson import json_util
with open("/XXXXX6/bi/XXXXX/XXXXX3/v0/test/bson.json", "rb") as f:
bson = f.read()
data= bson.replace('u\'','') – removal of Unicode 
data1 = data.replace('\n','') – removal of \n
json.dump(json_util.dumps(data), open("bson1.json", "w"))

使用 json.dump 为我提供了有效的 json，但格式为“\”。

如何提取 unicode 中的值？所以，我可以创建一个 PySpark DataFrame。

【问题讨论】：

如何删除 unicode u-character .using python2.7 .
你想提取u'string'里面的字符串吗？
@prazy 我想删除 unicode 字符并将我的 json 作为有效的 json 来创建数据框
为什么数据首先以这种格式存在？如果你控制了字符串的创建，你应该在那里解决问题
我感觉这是来自 mongoDB。

标签： python json dataframe pyspark bson

【解决方案1】：

在 json.dumps 中使用 ensure_ascii=False：

bson = f.read()
json.dumps(bson, ensure_ascii=False).encode('utf8')

这将避免 unicode 输出。编码功能可用于编码为您想要的格式。大多数情况下，使用 'utf8' 是安全的

【讨论】：

不，我仍然收到此 unicode 错误。否则，您可以指导我替换 u，因为我正在使用 teh 行 data=bson.replace('u\'','') – 删除 Unicode，但此行将 u 连同第一个单引号一起删除。我们应该删除并用单引号替换它
那你需要加入它。 @ra
@Prazy 目标是我需要将其加载到数据帧中。我应该用我的 json 做什么。因为当我加载我的 json 时，我会发现我的 json 是有效的。购买火花不允许创建数据框。它显示损坏的记录
我使用以下命令将其加载到 pyspark test=hiveContext.read.option("multiline","true").json(sc.wholeTextFiles('file:/xxx/xxxx. json').values())
@Raja hiveContext.read() 将创建一个 DataFrame。您可以将输出粘贴到问题中吗？当你输入测试时。