【问题标题】:How to write a dictionary that stores non ascii characters into json file in python如何在python中编写一个将非ascii字符存储到json文件中的字典
【发布时间】:2017-09-14 19:53:21
【问题描述】:

这是我第一次使用 python,因此如果问题很明显,我很抱歉。

这里我有一本字典,我想把它写成一个 json 文件。为此,我做了以下操作:

result = {}
result["c2i"] = c2i # c2i is a dictionary
result["data"] = data # data is a list that stores integers
with io.open("json_test.json", 'w',encoding='utf-8') as outfile:
    json.dump(result, outfile)

不幸的是,当我这样做时,出现以下错误:

Traceback (most recent call last):
  File "dataprocess.py", line 140, in <module>
    json.dump(result, outfile)
  File "/home/anaconda2/lib/python2.7/json/__init__.py", line 190, in dump
    fp.write(chunk)
TypeError: write() argument 1 must be unicode, not str

i2c字典的内容如下:

>>> i2c
{49: '&', 50: '|',  56: '^', 57: '=', 58: '<', 59: '*', 60: '\xc2', 61: '\xa3', 62: '$', 63: '\xc3', 64: '\xa2', 65: '\xe2', 66: '\x82', 67: '\xac', 68: '\xef', 69: '\xbc', 70: '\xa6', 71: '\xaf', 72: '\xb7', 73: '>', 74: '+', 75: '\xab', 76: '\x97', 77: '~', 78: '\xad', 79: '\x98', 80: '\x86', 81: '\xb3', 82: ']', 83: '\x84', 84: '\x83', 85: '\xf0', 86: '\x9f', 87: '\x87', 88: '\xb1', 89: '\xb4', 90: '\xc4', 91: '\xb0', 92: '\xb6', 93: '[', 94: '\\', 95: '\xf3', 96: '\xbe', 97: '\x8d', 98: '\x81', 99: '\xe3', 100: '\xbb', 101: '\x8b', 102: '\xc5', 103: '\x93', 104: '\x85', 105: '\xe4', 106: '\xbd', 107: '\xa0', 108: '\xe5', 109: '\xe7', 110: '\xae', 111: '\xe9', 112: '\x9a', 113: '\x94', 114: '\xe6', 115: '\x88', 116: '\x91', 117: '\xa5', 118: '\xe8', 119: '\xb2', 120: '}', 121: '\xe0', 122: '\xb8', 123: '\xa7', 124: ':broken_heart:', 125: ':loudly_crying_face:', 126: ':black_rightwards_arrow:', 127: ':white_left_pointing_backhand_index:', 128: ':dizzy_face:', 129: ':cloud:', 130: ':white_right_pointing_backhand_index:', 131: ':heavy_black_heart:', 132: ':smiling_face_with_smiling_eyes:', 133: ':sparkling_heart:', 134: ':smiling_cat_face_with_heart-shaped_eyes:', 135: ':oncoming_bus:', 136: ':man_with_turban:', 137: ':confused_face:', 138: ':cross_mark:', 139: ':smiling_face_with_open_mouth_and_tightly-closed_eyes:', 140: ':party_popper:', 141: ':open_hands_sign:', 142: ':earth_globe_asia-australia:', 143: ':sleepy_face:', 144: ':pensive_face:', 145: ':weary_face:', 146: ':smiling_face_with_sunglasses:', 147: ':droplet:', 148: ':persevering_face:', 149: ':crown:', 150: ':sleeping_face:', 151: ':musical_score:', 152: ':teacup_without_handle:', 153: ':hot_beverage:', 154: ':awe|boy:', 155: ':cocktail_glass:', 156: ':worried_face:', 157: ':thought_balloon:', 158: ':cat_face:', 159: ':personal_computer:', 160: ':splashing_sweat_symbol:', 161: ':electric_plug:', 162: ':kiss_mark:', 163: ':trophy:', 164: ':airplane:', 165: ':face_with_no_good_gesture:', 166: ':princess:', 167: ':disappointed_face:', 168: ':pouting_face:', 169: ':sparkles:', 170: ':high_voltage_sign:', 171: ':bomb:', 172: ':purple_heart:', 173: ':christmas_tree:', 174: ':black_heart_suit:', 175: ':speak-no-evil_monkey:', 176: ':woman_with_bunny_ears:', 177: ':person_bowing_deeply:', 178: ':smiling_face_with_halo:', 179: ':smiling_face_with_heart-shaped_eyes:', 180: ':beating_heart:', 181: ':unamused_face:', 182: ':ok_hand_sign:', 183: ':smiling_face_with_open_mouth:', 184: ':see-no-evil_monkey:', 185: ':face_without_mouth:', 186: ':musical_note:', 187: ':hocho:', 188: ':violin:', 189: ':smiling_face_with_open_mouth_and_cold_sweat:', 190: ':basketball_and_hoop:', 191: ':person_raising_both_hands_in_celebration:', 192: ':books:', 193: ':pistol:', 194: ':happy_person_raising_one_hand:', 195: ':thumbs_up_sign:', 196: ':heart_with_arrow:', 197: ':thumbs_down_sign:', 198: ':grinning_face_with_smiling_eyes:', 199: ':weary_cat_face:', 200: ':snowflake:', 201: ':multiple_musical_notes:', 202: ':frog_face:', 203: ':umbrella_with_rain_drops:', 204: ':runner:', 205: ':winking_face:', 206: ':fire_engine:', 207: ':face_with_medical_mask:', 208: ':green_heart:', 209: ':face_with_ok_gesture:', 210: ':camera:', 211: ':french_fries:', 212: ':tropical_drink:', 213: ':smiling_face_with_open_mouth_and_smiling_eyes:', 214: ':astonished_face:', 215: ':hundred_points_symbol:', 216: ':palm_tree:', 217: ':face_with_open_mouth_and_cold_sweat:', 218: ':clinking_beer_mugs:', 219: ':dash_symbol:', 220: ':flag_for_faroe_islands:', 221: ':face_with_stuck-out_tongue:', 222: ':pedestrian:', 223: ':face_throwing_a_kiss:', 224: ':raised_hand:', 225: ':confounded_face:', 226: ':dog_face:', 227: ':police_car:', 228: ':bath:', 229: ':face_screaming_in_fear:', 230: ':bust_in_silhouette:', 231: ':baseball:', 232: ':ambulance:', 233: ':squared_sos:', 234: ':wine_glass:', 235: ':imagined...re:', 236: ':face_with_tears_of_joy:', 237: ':dancer:', 238: ':clapping_hands_sign:', 239: ':heavy_large_circle:', 240: ':face_with_stuck-out_tongue_and_winking_eye:', 241: ':hatching_chick:', 242: ':open_book:', 243: ':white_smiling_face:', 244: ':fisted_hand_sign:', 245: ':tired_face:', 246: ':face_with_stuck-out_tongue_and_tightly-closed_eyes:', 247: ':snowman_without_snow:', 248: ':information_desk_person:', 249: ':two_women_holding_hands:', 250: ':two_hearts:', 251: ':angry_face:', 252: ':headphone:', 253: ':white_heavy_check_mark:', 254: ':wrapped_present:', 255: ':floppy_disk:', 256: ':soon_with_rightwards_arrow_above:', 257: ':white_frowning_face:', 258: ':grinning_face:', 259: ':black_sun_with_rays:', 260: ':crying_face:', 261: ':aubergine:', 262: ':face_savouring_delicious_food:', 263: ':victory_hand:', 264: ':flag_for_united_kingdom:', 265: ':flushed_face:', 266: ':mouse:', 267: ':rocket:', 268: ':person_with_folded_hands:', 269: ':father_christmas:', 270: ':face_with_look_of_triumph:', 271: ':nail_polish:', 272: ':skull:', 273: ':fork_and_knife:', 274: ':expressionless_face:', 275: ':growing_heart:', 276: ':microphone:', 277: ':fire:', 278: ':sleeping_symbol:', 279: ':money_bag:', 280: ':grimacing_face:', 281: ':flexed_biceps:', 282: ':smirking_face:', 283: ':pile_of_poo:', 284: ':slice_of_pizza:', 285: ':neutral_face:'}

我相信,像 '\xc2' 这样的键会导致问题,但我找不到处理它的方法。稍后我将使用其他编程语言中的这些 json 文件。

编辑:我使用 Python 2.7

EDIT 2

正如其中一个答案所建议的,我选择第二个选项:

result = {}
result[u"i2c"] = {k:v.decode('iso-8859-1') for k, v in i2c.items()}
result["data"] = encoded_data
with io.open("deneme_jstonout.json", 'w') as outfile:
    json.dump(result, outfile)

但是在这种情况下,我收到以下错误:

Traceback (most recent call last):
  File "dataprocess.py", line 137, in <module>
    result[u"i2c"] = {k:v.decode('iso-8859-1') for k, v in i2c.items()}
  File "dataprocess.py", line 137, in <dictcomp>
    result[u"i2c"] = {k:v.decode('iso-8859-1') for k, v in i2c.items()}
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc2' in position 0: ordinal not in range(128)

当我尝试将字典中的每个值转换为 unicode 时,我也会收到一个错误:

编辑 3

>>> for i in i2c:
...     i2c[i] = unicode(i2c[i])
... 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

【问题讨论】:

  • Python 中的 Unicode 在版本 2 和 3 之间发生了很大变化。您使用的是哪个版本?
  • @PauloScardine 2.7
  • @PauloScardine 当我输入&gt;&gt;&gt; for i in i2c print type(i2c[i]) 时,所有这些都打印为''
  • @PauloScardine 与您的最后一条评论,我收到以下错误:在 json.dump({k:v.decode('iso-8859-1') for k, v in result .items()}, outfile) AttributeError: 'dict' object has no attribute 'decode'
  • 对不起,试试c2i = {k:v.decode('iso-8859-1') for k, v in c2i.items()}

标签: python json dictionary unicode


【解决方案1】:

问题是,JSON 是 Unicode,因此 json 模块需要字符串数据为 unicode 数据类型数据,因为它不愿意为您猜测字符串的编码。这是一个 Python2 问题,在 Python 3 中所有字符串都已经是 unicode。您有 3 个选择:

1) 使用 unicode 字面量

i2c = {
    49: u'&', 
    50: u'|',  
    56: u'^', 
    57: u'=', 
    58: u'<',
    ...
    285: u':neutral_face:',
}

如果您从其他来源(API、数据库、文本文件)导入数据,最佳做法是在数据进入应用程序时始终将数据解码为 un​​icode,并在数据离开应用程序时对其进行编码。

2) 将字符串数据转换为unicode

 result[u"i2c"] = {k:v.decode('iso-8859-1') for k, v in i2c.items()}
 result[u"data"] = data

您的示例看起来不像 UTF-8,所以我猜是 Latin1,但您必须知道真正的编解码器,因为它可能不是 Latin1(并且任何东西都解码为 Latin1)。

3) 使用 Python3

Python 3 使有关 unicode 的所有内容都更加明确。任何 io 操作都会为您提供 unicode 字符串或字节,因此没有歧义。无论你的程序是否可以工作,在 Python 2 中,该程序似乎可以工作,但有时会起火。

【讨论】:

  • 我根据您的第二个解决方案获得的结果编辑了我的问题,您可以看看吗?
  • 尝试使用open 而不是io.openjson 模块将为您处理编码。
猜你喜欢
  • 2014-04-04
  • 2013-03-29
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-10-23
  • 2015-03-19
  • 2021-11-29
  • 1970-01-01
相关资源
最近更新 更多