将 json 数据从 mongodb 导出到 csv答案

【问题标题】：export json data to csv from mongodb将 json 数据从 mongodb 导出到 csv
【发布时间】：2016-10-25 21:57:07
【问题描述】：

从 mongodb 将数据导出到 csv 时，我遇到了 python 脚本中缺少字段名称的问题。 type 字段名称存在于第一条记录中，但它不会出现在其余记录中。如果类型字段不存在，如何编写python脚本为类型字段赋予空值。

Mongodb集合示例：

"stages": [
    {
        "interview": false,
        "hmNotification": false,
        "hmStage": false,
        "type": "new",
        "isEditable": false,
        "order": 0,
        "name": {
            "en": "New"
        },
        "stageId": "51d1a2f4c0d9887b214f3694"
    },
    {
        "interview": false,
        "hmNotification": true,
        "isEditable": true,
        "order": 1,
        "hmStage": true,
        "name": {
            "en": "Pre-Screen"
        },
        "stageId": "51f0078d7297363f62059699"
    },
    {
        "interview": false,
        "hmNotification": false,
        "hmStage": false,
        "isEditable": true,
        "order": 2,
        "name": {
            "en": "Phone Screen"
        },
        "stageId": "51d1a326c0d9887721778eae"
    }]

Python 脚本示例：

import csv
cursor = db.workflows.find( {}, {'_id': 1, 'stages.interview': 1, 'stages.hmNotification': 1, 'stages.hmStage': 1, 'stages.type':1, 'stages.isEditable':1, 'stages.order':1,   
'stages.name':1, 'stages.stageId':1 })
flattened_records = []
for stages_record in cursor:
    stages_record_id = stages_record['_id']
    for stage_record in stages_record['stages']:
        flattened_record = {
            '_id': stages_record_id,
            'stages.interview': stage_record['interview'],
            'stages.hmNotification': stage_record['hmNotification'],
            'stages.hmStage': stage_record['hmStage'],
            'stages.type': stage_record['type'],  
            'stages.isEditable': stage_record['isEditable'],
            'stages.order': stage_record['order'],
            'stages.name': stage_record['name'],
            'stages.stageId': stage_record['stageId']}                   
        flattened_records.append(flattened_record)

当运行 python 脚本时，它显示 keyerror:"type"。请帮助我如何在脚本中添加缺少的字段名称。

【问题讨论】：

标签： python arrays mongodb csv

【解决方案1】：

当您尝试获取 Python 字典中可能不存在的值时，可以使用 dict 类的 .get() 方法。

例如，假设您有一本这样的字典：

my_dict = {'a': 1, 
           'b': 2,
           'c': 3}

您可以使用get 方法获取存在的密钥之一：

>>> print(my_dict.get('a'))
1

但是如果你尝试获取一个不存在的key（比如does_not_exist），你会默认得到None：

>>> print(my_dict.get("does_not_exist"))
None

如文档中所述，您还可以提供一个 default 值，该值将在密钥不存在时返回：

>>> print(my_dict.get("does_not_exist", "default_value"))
default_value

但是如果字典中确实存在该键，则不会使用此默认值（如果该键确实存在，您将获得它的值）：

>>> print(my_dict.get("a", "default_value"))
1

知道，当您构建 flattened_record 时，您可以：

'stages.hmStage': stage_record['hmStage'],
'stages.type': stage_record.get('type', ""),  
'stages.isEditable': stage_record['isEditable'],

所以如果stage_record 字典不包含键type，get('type') 将返回一个空字符串。

你也可以试试：

'stages.hmStage': stage_record['hmStage'],
'stages.type': stage_record.get('type'),  
'stages.isEditable': stage_record['isEditable'],

然后当stage_record 不包含type 键时，stage_record.get('type') 将返回None。

或者你可以设置默认的"UNKNOWN"

'stages.type': stage_record.get('type', "UNKNOWN"),

【讨论】：

在 python 脚本中添加 "stages.typ':stage_record.get('type') 时效果很好。类型的值出现在 CSV 文件中。非常感谢。
感谢您的帮助。如何在 Python 中删除 u 字母、括号和子字段的名称？谢谢。