【发布时间】:2018-03-26 20:19:17
【问题描述】:
我想知道如何计算数据框中单列中出现的唯一符号的数量。例如:
df = pd.DataFrame({'col1': ['a', 'bbb', 'cc', ''], 'col2': ['ddd', 'eeeee', 'ff', 'ggggggg']})
df col1 col2
0 a ddd
1 bbb eeeee
2 cc ff
3 gggggg
应该计算出col1包含3个唯一符号,col2包含4个唯一符号。
到目前为止我的代码(但这可能是错误的):
unique_symbols = [0]*203
i = 0
for col in df.columns:
observed_symbols = []
df_temp = df[[col]]
df_temp = df_temp.astype('str')
#This part is where I am not so sure
for index, row in df_temp.iterrows():
pass
if symbol not in observed_symbols:
observed_symbols.append(symbol)
unique_symbols[i] = len(observed_symbols)
i += 1
提前致谢
【问题讨论】:
-
所以要清楚,如果 col3 是 {col3: eee, rrr, ere} 它会返回 2?
-
没错:)
标签: python pandas dataframe set unique