Python 响应 API JSON 到 CSV 表答案

【问题标题】：Python Response API JSON to CSV tablePython 响应 API JSON 到 CSV 表
【发布时间】：2017-09-09 22:19:45
【问题描述】：

您会看到我用来通过 IBM 的 API 收集一些数据的代码。但是，我在通过 python 将输出保存到 csv 表时遇到了一些问题。

这些是我想要的列（及其值）：

emotion__document__emotion__anger   emotion__document__emotion__joy
emotion__document__emotion__sadness emotion__document__emotion__fear    
emotion__document__emotion__disgust sentiment__document__score  
sentiment__document__label  language    entities__relevance 
entities__text  entities__type  entities__count concepts__relevance
concepts__text  concepts__dbpedia_resource  usage__text_characters
usage__features usage__text_units   retrieved_url

这是我用来收集数据的代码：

response = natural_language_understanding.analyze(
  url=url,
  features=[
  Features.Emotion(),
  Features.Sentiment(),
  Features.Concepts(limit=1),
  Features.Entities(limit=1)
          ]
  )


data = json.load(response)
rows_list = []
cols = []

for ind,row in enumerate(data):

    if ind == 0:
        cols.append(["usage__{}".format(i) for i in row["usage"].keys()])
        cols.append(["emotion__document__emotion__{}".format(i) for i in row["emotion"]["document"]["emotion"].keys()])
        cols.append(["sentiment__document__{}".format(i) for i in row["sentiment"]["document"].keys()])
        cols.append(["concepts__{}".format(i) for i in row["concepts"].keys()])
        cols.append(["entities__{}".format(i) for i in row["entities"].keys()])
        cols.append(["retrieved_url"])

    d = OrderedDict()


    d.update(row["usage"])
    d.update(row["emotion"]["document"]["emotion"])
    d.update(row["sentiment"]["document"])
    d.update(row["concepts"])
    d.update(row["entities"])
    d.update({"retrieved_url":row["retrieved_url"]})

    rows_list.append(d)


df = pd.DataFrame(rows_list)
df.columns = [i for subitem in cols for i in subitem]
df.to_csv("featuresoutput.csv", index=False)

变化

cols.append(["concepts__{}".format(i) for i in row["concepts"][0].keys()])
cols.append(["entities__{}".format(i) for i in row["entities"][0].keys()])

没有解决问题

【问题讨论】：

标签： python json csv python-requests ibm-cloud

【解决方案1】：

如果您从 API 获取它，则响应将是 json 格式。您可以通过以下方式将其输出到 csv 中：

import csv, json
response = the json response you get from the API
attributes = [emotion__document__emotion__anger, emotion__document__emotion__joy.....attributes you want]
data = json.load(response)
with open('output.csv', 'w') as f:
    writer = csv.writer(f, delimiter=',')
    for attribute in attributes:   
        writer.writerow(data[attribute][0])
    f.close()

确保数据是字典而不是字符串，Python 3.6 应该返回一个字典。打印几行以查看所需数据的存储方式。

【讨论】：

这也适用于 python 2.7 吗？因为我收到消息： attributes=[emotion__document__emotion__anger,emotion__document__emotion__joy, NameError: name 'emotion__document__emotion__anger' is not defined
属性应该被引用为字符串，'emotion__document__emotion__anger'
我用于 Azure 人脸 API 的一个技巧是 data = response.read()，然后是 ast.literal_eval(data) 使其成为 python 字典。然后根据字典我可以做类似dictionary['isIdentical'] == True
调整后我仍然收到同样的错误：attributes='usage__text_units', .... 'retrieved_url'] data = response.read() ast.literal_eval(data) dictionary[' isIdentical'] == True with open('0output.csv', 'w') as f: ...
这意味着你仍然得到一个字符串，而不是字典，尝试执行每一行，看看哪里出错了

【解决方案2】：

这一行将一个字符串分配给数据：

data=(json.dumps(datas, indent=2))

所以在这里你迭代一个字符串的字符：

for ind,row in enumerate(data):

在这种情况下，row 将是一个字符串，而不是字典。因此，例如，row["usage"] 在这种情况下会给您这样的错误。

也许您想遍历datas？

更新

代码还有一些其他问题，比如：

cols.append(["concepts__{}".format(i) for i in row["concepts"].keys()])

在这种情况下，您可能希望row["concepts"][0].keys() 获取第一个元素的键，因为row["concepts"] 是一个数组。

我对 pandas 不是很熟悉，但我建议你看一下 pandas 中包含的json_normalize，它可以帮助扁平化 JSON 结构。您可能面临的一个问题是包含文档数组的概念和实体。这意味着您必须包含相同的文档，至少 max(len(concepts), len(entities)) 次。

【讨论】：

现在我在删除数据并将其更改为数据并将其更改为 data = json.load(response) 后收到错误消息：“AttributeError: 'dict' object has no attribute 'read'”。这里的响应是 API 的输出。
感谢您的建议。我更新了它。输出是 'dict' 对象没有属性 'read'
改变了，但这并没有解决问题。我仍然收到错误 AttributeError: 'dict' object has no attribute 'read'。（另一种将响应存储为 csv 然后 pandas 对我来说也很好）在我更新更改 cols.append(["concepts__{}".format(i) for i in row["concepts"][0]. keys()]) cols.append(["entities__{}".format(i) for i in row["entities"][0].keys()])
我应该在哪里添加另一个问题：max(len(concepts), len(entities)) ?