【发布时间】:2016-01-15 14:35:05
【问题描述】:
import pandas as pd
import numpy as np
import random
labels = ["c1","c2","c3"]
c1 = ["one","one","one","two","two","three","three","three","three"]
c2 = [random.random() for i in range(len(c1))]
c3 = ["alpha","beta","gamma","alpha","gamma","alpha","beta","gamma","zeta"]
DF = pd.DataFrame(np.array([c1,c2,c3])).T
DF.columns = labels
DataFrame 看起来像:
c1 c2 c3
0 one 0.440958516531 alpha
1 one 0.476439953723 beta
2 one 0.254235673552 gamma
3 two 0.882724336464 alpha
4 two 0.79817899139 gamma
5 three 0.677464637887 alpha
6 three 0.292927670096 beta
7 three 0.0971956881825 gamma
8 three 0.993934915508 zeta
我能想到制作字典的唯一方法是:
D_greek_value = {}
for greek in set(DF["c3"]):
D_c1_c2 = {}
for i in range(DF.shape[0]):
row = DF.iloc[i,:]
if row[2] == greek:
D_c1_c2[row[0]] = row[1]
D_greek_value[greek] = D_c1_c2
D_greek_value
生成的字典如下所示:
{'alpha': {'one': '0.67919712421',
'three': '0.67171020684',
'two': '0.571150669821'},
'beta': {'one': '0.895090207979', 'three': '0.489490074662'},
'gamma': {'one': '0.964777504708',
'three': '0.134397632659',
'two': '0.10302290374'},
'zeta': {'three': '0.0204226923557'}}
我不想假设 c1 会成块出现(“一个”每次都在一起)。我在几百 MB 的 csv 上执行此操作,我觉得我做错了。如果您有任何想法,请提供帮助!
【问题讨论】:
标签: python pandas hash machine-learning dataframe