【问题标题】:compare list of data with CSV file and sort the matching将数据列表与 CSV 文件进行比较并对匹配项进行排序
【发布时间】:2018-04-13 06:36:24
【问题描述】:

我有一个产品名称数据集和一个品牌列表。 我需要找出我的列表中有多少品牌产品。

**Brands sample :** ['HM International', 'Sara', 'Wildcraft', 'Nike']
**Product name sample :** [Attache backpack11Green Waterproof Backpack
Simba BTSPOKEMON POKÈMON POKÈ BALLS 18 BP Waterproof S...
HM International HMHTPB 24304MK Waterproof Multipurpos...
Chris & Kate CKB_122SS Waterproof School Bag
Simba BTSPRINCESS FOLLOW YOUR DREAMS 16 BP Waterproof ...
Kuber Industries School Bag, Backpack Waterproof School...
Minnie Trio School Bag Waterproof School Bag
Thomas School Bag Waterproof School Bag
Sara Green 002 Shoulder Bag
Disney Frozen Anna & Elsa Pink Sequins 16' ' Backpack
Disney Princess Pink Flap 18' ' Backpack
My Baby Excel Peppa Side Sling Bag Sling Bag
Ranger Black School Bag with laptop compartment Waterpr...
HM International HMHTPB 73279AV Waterproof Multipurpos...
Peppa Peppa Pig Pink Plush Toy Wallet Round Shape Plush...
Disney Frozen Anna & Elsa Pink Sequins 14' ' Backpack
Disney Frozen Magic Blue 16' ' School Bag
Good Friends stylish Waterproof School Bag
ZEVORA Pink 3D Design Children Travel & School Bag, 1 L...
Gleam A103 School Bag
SARA BAGS TG15 Waterproof Backpack
Despicable Me Favourite Subject School Bag 16 inches Tr...
AARIP LTB037 Waterproof School Bag
Simba BTSSMURFS FOOTBALL 18 BP Waterproof School Bag
Gleam JB0402C Waterproof School Bag
Simba BTSSMURFS SMURFETTE SINGING STAR 18 BP Waterproo... ]

【问题讨论】:

标签: python pandas analytics data-analysis text-analysis


【解决方案1】:

我建议使用str.findallword boundary regex 搜索多个值,然后展平嵌套列表并使用Counter

from collections import Counter

Brands = ['HM International', 'Sara', 'Wildcraft', 'Nike']
pat = r'\b{}\b'.format('|'.join(Brands))

d = Counter([y for x in df['Product'].str.findall(pat) for y in x])
print (d)

Counter({'HM International': 2, 'Sara': 1})

或者如果想要Series 在输出中使用Series.value_counts:

s = pd.Series(np.concatenate(df['Product'].str.findall(pat))).value_counts()
print (s)
HM International    2
Sara                1
dtype: int64

设置

d = {'Product': ['Attache backpack11Green Waterproof Backpack', 'Simba BTSPOKEMON POKÈMON POKÈ BALLS 18 BP Waterproof S...', 'HM International HMHTPB 24304MK Waterproof Multipurpos...', 'Chris & Kate CKB_122SS Waterproof School Bag', 'Simba BTSPRINCESS FOLLOW YOUR DREAMS 16 BP Waterproof ...', 'Kuber Industries School Bag, Backpack Waterproof School...', 'Minnie Trio School Bag Waterproof School Bag', 'Thomas School Bag Waterproof School Bag', 'Sara Green 002 Shoulder Bag', "Disney Frozen Anna & Elsa Pink Sequins 16' ' Backpack", "Disney Princess Pink Flap 18' ' Backpack", 'My Baby Excel Peppa Side Sling Bag Sling Bag', 'Ranger Black School Bag with laptop compartment Waterpr...', 'HM International HMHTPB 73279AV Waterproof Multipurpos...', 'Peppa Peppa Pig Pink Plush Toy Wallet Round Shape Plush...', "Disney Frozen Anna & Elsa Pink Sequins 14' ' Backpack", "Disney Frozen Magic Blue 16' ' School Bag", 'Good Friends stylish Waterproof School Bag', 'ZEVORA Pink 3D Design Children Travel & School Bag, 1 L...', 'Gleam A103 School Bag', 'SARA BAGS TG15 Waterproof Backpack', 'Despicable Me Favourite Subject School Bag 16 inches Tr...', 'AARIP LTB037 Waterproof School Bag', 'Simba BTSSMURFS FOOTBALL 18 BP Waterproof School Bag', 'Gleam JB0402C Waterproof School Bag', 'Simba BTSSMURFS SMURFETTE SINGING STAR 18 BP Waterproo']}
df = pd.DataFrame(d)
print (df.head())
                                             Product
0        Attache backpack11Green Waterproof Backpack
1  Simba BTSPOKEMON POKÈMON POKÈ BALLS 18 BP Wate...
2  HM International HMHTPB 24304MK Waterproof Mul...
3       Chris & Kate CKB_122SS Waterproof School Bag
4  Simba BTSPRINCESS FOLLOW YOUR DREAMS 16 BP Wat...

【讨论】:

    猜你喜欢
    • 2020-05-26
    • 1970-01-01
    • 1970-01-01
    • 2023-02-23
    • 1970-01-01
    • 1970-01-01
    • 2017-08-06
    • 2022-01-17
    • 2013-06-03
    相关资源
    最近更新 更多