【发布时间】:2021-01-07 17:10:24
【问题描述】:
我有一个数据框(示例如下所示)
Type SKU Description FullDescription Size Price
Variable 2 Boots Shoes on sale XL,S,M
Variation 2.5 Boots XL XL 330
Variation 2.6 Boots S S 330
Variation 2.7 Boots M M 330
Variable 3 Helmet Helmet Sizes E42,E41
Variation 3.8 Helmet E42 E42 89
Variation 3.2 Helmet E41 E41 89
我要做的是根据大小对值进行排序,因此最终数据框应如下所示:
Type SKU Description FullDescription Size Price
Variable 2 Boots Shoes on sale S,M,XL
Variation 2.6 Boots S S 330
Variation 2.7 Boots M M 330
Variation 2.5 Boots XL XL 330
Variable 3 Boots Helmet Sizes E41,E42
Variation 3.2 Helmet E41 E41 89
Variation 3.8 Helmet E42 E42 89
我能够使用此代码成功获得结果
sizes, dig = ['S','M','XL','L',], ['000','111','333','222'] #make sure dig values do not exist as a substring anywhere in your dataframe
df = (df.assign(Size=df['Size'].replace(sizes, dig, regex=True))
.assign(grp=(df['Type'] == 'Variable').cumsum())
.sort_values(['grp', 'Type', 'Size']).drop('grp', axis=1))
df['Size'] = df['Size'].apply(lambda x: ','.join(sorted(x.split(',')))).replace(dig, sizes, regex=True)
df
问题是给定的代码在数据帧上不起作用
Type SKU Description FullDescription Size Price
Variable 2 Boots Shoes on sale XL,S,3XL
Variation 2.5 Boots XL XL 330
Variation 2.6 Boots 3XL 3XL 330
Variation 2.7 Boots S S 330
Variable 3 Helmet Helmet Sizes S19, S9
Variation 3.8 Helmet E42 S19 89
Variation 3.2 Helmet E41 S9 89
它给出的结果是 'S,3XL,XL' 和 'S19,S9' 而我想要的结果是
Type SKU Description FullDescription Size Price
Variable 2 Boots Shoes on sale S,XL,3XL
Variation 2.7 Boots S S 330
Variation 2.5 Boots XL XL 330
Variation 2.6 Boots 3XL 3XL 330
Variable 3 Helmet Helmet Sizes S9,S19
Variation 3.2 Helmet E41 S9 89
Variation 3.8 Helmet E42 S19 89
如果尺寸更大,顺序应该是'XXS,XS,S,M,L,XL,XXL,3XL,4XL,5XL',如果是第二个例子,'S9,S19,M9,M19,L9 and so on'
这是我到目前为止所做的,但它不起作用并且显示错误的顺序
sizes, dig = ['XS','S','M','L','XL','XXL','3XL','4XL','5XL'], ['000','111','222','333','444','555','666','777','888'] #make sure dig values do not exist as a substring anywhere in your dataframe
df = (df.assign(Size=df['Size'].replace(sizes, dig, regex=True))
.assign(grp=(df['Type'] == 'variable').cumsum())
.sort_values(['grp', 'Type', 'Size']).drop('grp', axis=1))
df['Size'] = df['Size'].apply(lambda x: ','.join(sorted(x.split(',')))).replace(dig, sizes, regex=True)
【问题讨论】:
标签: python python-3.x pandas dataframe