将函数应用于数据框列答案

【问题标题】：Applying a function to dataframe column将函数应用于数据框列
【发布时间】：2019-06-01 03:31:16
【问题描述】：

我正在尝试将函数应用于数据框的列，但它不断抛出错误。我需要你的帮助。
该函数假设删除不包含数组keywordz 中任何项目的行。

功能 »

def get_restuarant_business(data):
    keywordz=['food','restuarant','bakery','deli','fast', 
                  'food','bars','coffee']

    data=data.lower()
    while((data != '' or pd.isnull(data)==False ) and isinstance(data, 
    str)):  
       flag= False
       for i in keywordz:
          if i in data:
             flag=True
             break
          else:
             continue
    return flag

rest_biz = business.copy().loc[business['categories'].head(1).apply(
                                     get_restuarant_business) == True]

这是被抛出的异常。

----------------------------------------------------------------------- 
----
TypeError                                 Traceback (most recent call 
last)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in 
pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-13-8da5e44c6072> in <module>()
1 print(business.head(5))
----> 2 business['categories'].apply(get_restuarant_business)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py
 in __getitem__(self, key)
764         key = com._apply_if_callable(key, self)
765         try:
766             result = self.index.get_value(self, key)
767 
768             if not is_scalar(result):

~\AppData\Local\Continuum\anaconda3\lib\site- 
packages\pandas\core\indexes\base.py in get_value(self, series, key)
3101         try:
3102             return self._engine.get_value(s, k,
3103                                           tz=getattr(series.dtype, 'tz', None))
3104         except KeyError as e1:
3105             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

 pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: 'categories'

0    'tours, breweries, pizza, restaurants, food, h...
1    'chicken wings, burgers, caterers, street vend...
2    'breakfast & brunch, restaurants, french, sand...
3    'home & garden, nurseries & gardening, shoppin...
4                                 'coffee & tea, food'
 Name: categories, dtype: object

你能帮帮我吗？

【问题讨论】：

标签： python pandas

【解决方案1】：

试试这个！

import numpy as np
business = pd.DataFrame({'categories':['tours, breweries, pizza, restaurants, food',
                                        'chicken wings, burgers, caterers, street vend',
                                       'breakfast & brunch, restaurants, french, sand',
                                       'home & garden, nurseries & gardening, shopping']})

keywordz=['food','restaurants','bakery','deli','fast','food','bars','coffee']

rest_biz = business[business['categories'].apply(lambda x: np.any([True if w.lower() in keywordz else False for w in x.split(', ')]))]

# output
    categories
0   tours, breweries, pizza, restaurants, food

【讨论】：

【解决方案2】：

我认为下面的功能会解决你的目的

def get_restuarant_business(data):
    keywordz=['food','restuarant','bakery','deli','fast food','bars','coffee']

    data=data.lower()
    flag= False
    if data in keywordz:
        flag= True

    return flag

调用这个

business_df['food_cat'] = business_df['categories'].apply(
    get_restuarant_business)

过滤你的真实情况

【讨论】：

这对我有用。我仍然保持自己的功能，但后来我添加了过滤器