【问题标题】:Applying a function to dataframe column将函数应用于数据框列
【发布时间】:2019-06-01 03:31:16
【问题描述】:

我正在尝试将函数应用于数据框的列,但它不断抛出错误。我需要你的帮助。
该函数假设删除不包含数组keywordz 中任何项目的行。

功能 »

def get_restuarant_business(data):
    keywordz=['food','restuarant','bakery','deli','fast', 
                  'food','bars','coffee']

    data=data.lower()
    while((data != '' or pd.isnull(data)==False ) and isinstance(data, 
    str)):  
       flag= False
       for i in keywordz:
          if i in data:
             flag=True
             break
          else:
             continue
    return flag

rest_biz = business.copy().loc[business['categories'].head(1).apply(
                                     get_restuarant_business) == True]

这是被抛出的异常。

----------------------------------------------------------------------- 
----
TypeError                                 Traceback (most recent call 
last)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in 
pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-13-8da5e44c6072> in <module>()
1 print(business.head(5))
----> 2 business['categories'].apply(get_restuarant_business)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py
 in __getitem__(self, key)
764         key = com._apply_if_callable(key, self)
765         try:
766             result = self.index.get_value(self, key)
767 
768             if not is_scalar(result):

~\AppData\Local\Continuum\anaconda3\lib\site- 
packages\pandas\core\indexes\base.py in get_value(self, series, key)
3101         try:
3102             return self._engine.get_value(s, k,
3103                                           tz=getattr(series.dtype, 'tz', None))
3104         except KeyError as e1:
3105             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

 pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: 'categories'
0    'tours, breweries, pizza, restaurants, food, h...
1    'chicken wings, burgers, caterers, street vend...
2    'breakfast & brunch, restaurants, french, sand...
3    'home & garden, nurseries & gardening, shoppin...
4                                 'coffee & tea, food'
 Name: categories, dtype: object

你能帮帮我吗?

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    试试这个!

    import numpy as np
    business = pd.DataFrame({'categories':['tours, breweries, pizza, restaurants, food',
                                            'chicken wings, burgers, caterers, street vend',
                                           'breakfast & brunch, restaurants, french, sand',
                                           'home & garden, nurseries & gardening, shopping']})
    
    keywordz=['food','restaurants','bakery','deli','fast','food','bars','coffee']
    
    rest_biz = business[business['categories'].apply(lambda x: np.any([True if w.lower() in keywordz else False for w in x.split(', ')]))]
    
    # output
        categories
    0   tours, breweries, pizza, restaurants, food
    

    【讨论】:

      【解决方案2】:

      我认为下面的功能会解决你的目的

      def get_restuarant_business(data):
          keywordz=['food','restuarant','bakery','deli','fast food','bars','coffee']
      
          data=data.lower()
          flag= False
          if data in keywordz:
              flag= True
      
          return flag
      

      调用这个

      business_df['food_cat'] = business_df['categories'].apply(
          get_restuarant_business)
      

      过滤你的真实情况

      【讨论】:

      • 这对我有用。我仍然保持自己的功能,但后来我添加了过滤器
      猜你喜欢
      • 2020-03-23
      • 1970-01-01
      • 2021-03-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-04-24
      相关资源
      最近更新 更多