【发布时间】:2019-08-06 18:18:27
【问题描述】:
任务:
在列产品、天、pgroup、价格(逻辑键 = 产品、天)的数据框中,某些行列 pgroup 为空。如果该产品的其他数据集包含一个值,则应将其用于空数据集。
目前我正在遍历产品,为每个产品搜索组的唯一值。 我想以更快的方式做到这一点。
示例:
数据:
df = pd.DataFrame([['a','2018-02-03','G1',47],
['a','2018-02-04',None,25],
['a','2018-02-05','G1',10],
['a','2018-02-06',None,22],
['a','2018-02-07',None,84],
['b','2018-02-03',None,10],
['b','2018-02-04',None,21],
['b','2018-02-05',None,2],
['b','2018-02-06','G2',18],
['b','2018-02-07','G2',11],
['c','2018-02-03','G2',63],
['c','2018-02-04','G2',83],
['c','2018-02-05',None,20],
['c','2018-02-06',None,68],
['c','2018-02-07',None,33]])
df.columns = ['product','day','pgroup', 'value']
代码:
# Loop for each product
for xprod in df['product'].unique().tolist():
# find unique values for pgroup
unique_values = df[df['product'] == xprod]['pgroup'].unique()
# Change Datatypes because of NaN-Values in Series
unique_values_str = [str(i) for i in unique_values]
# 2 values, first is NaN => take second
if len(unique_values_str) == 2 and (unique_values_str[0] == 'nan'):
df.loc[df['product'] == xprod, 'pgroup'] = unique_values_str[1]
# 2 values, second is NaN => take first
elif len(unique_values_str) == 2 and (unique_values_str[1] == 'nan'):
df.loc[df['product'] == xprod, 'pgroup'] = unique_values_str[0]
预期结果:
product day pgroup value
0 a 2018-02-03 G1 47
1 a 2018-02-04 G1 25
2 a 2018-02-05 G1 10
3 a 2018-02-06 G1 22
4 a 2018-02-07 G1 84
5 b 2018-02-03 G2 10
6 b 2018-02-04 G2 21
7 b 2018-02-05 G2 2
8 b 2018-02-06 G2 18
9 b 2018-02-07 G2 11
10 c 2018-02-03 G2 63
11 c 2018-02-04 G2 83
12 c 2018-02-05 G2 20
13 c 2018-02-06 G2 68
14 c 2018-02-07 G2 33
注释:
根据我的检查,最耗时的部分是前两行:
# Loop for each product
for xprod in df['product'].unique().tolist():
# find unique values for pgroup
unique_values = df[df['product'] == xprod]['pgroup'].unique()
【问题讨论】:
-
你的 MWE 坏了。
col_1是什么? -
谢谢和抱歉。
col_1应该是pgroup。我改变了这个。
标签: python loops dataframe group-by