【发布时间】:2015-07-02 09:55:13
【问题描述】:
具有如下内容的 PySpark 广播值:
[('b000jz4hqo', {'rom': 2.4051362683438153, 'clickart': 56.65432098765432, '950': 254.94444444444443, 'image': 3.6948470209339774, 'premier': 9.27070707070707, '000': 6.218157181571815, 'dvd': 1.287598204264871, 'broderbund': 22.169082125603865, 'pack': 2.98180636777128}), ('b0006zf55o' ,{'笔记本电脑':11.588383838383837,'台式机':12.74722222222222, '备份':2.8015873015873014,'赢':0.501859142607174,'ca': 9.10515873015873,'v11':50.98888888888888,'30u':84.98148148148148,'30pk':254.94444444444443,'桌面':2.23635477582846,'1': 0.3231235037318687, 'arcserve': 24.28042328042328, 'computer': 0.6965695203400122, 'lap': 127.47222222222221, 'oem': 46.35353535353535, 'international': 9.44238683127572, 'associates': 7.284126984126985})]
所以它是一个key->list广播变量。
尝试将 broadcast.value 转换为字典会导致
TypeError: unhashable type: 'dict'
使用类似的代码
from itertools import izip
amazonWeightsBroadcast = sc.broadcast(amazonWeightsRDD.collect())
i = iter(amazonWeightsBroadcast.value)
amazonWeightsDict = dict(izip(i, i))
也尝试过(给出相同的“不可变形”错误):
amazonWeightsDict = dict(amazonWeightsBroadcast.value[i:i+2] for i in range(0, len(amazonWeightsBroadcast.value), 2))
如果不能将广播变量转换为字典,那么通过键查找值列表的更好解决方案是什么?
Python 2.7.6 Spark 1.3.1
【问题讨论】:
标签: python dictionary apache-spark pyspark