【问题标题】:Extract rows from pandas Dataframe based on column value根据列值从熊猫数据框中提取行
【发布时间】:2017-06-26 23:53:26
【问题描述】:

如何从 Excel 文件创建的 Dataframe 中提取列与特定值匹配的行?

以下是 Dataframe 中的几行:

    Food            Men     Women
0   Total fruit     86.20   88.26
1   Apples, Total   89.01   89.66
2   Apples as fruit 89.18   90.42
3   Apple juice     88.78   88.42
4   Bananas         95.42   94.18
5   Berries         84.21   81.73
6   Grapes          88.79   88.13

这是我用来读取 excel 文件的代码,选择我需要的列并适当地重命名它们:

data1= pd.read_excel('USFoodCommodity.xls', sheetname='94-98 FAH', skiprows=76,skip_footer=142, parse_cols='A, H, K')
data1.columns = ['Food', 'Men', 'Women']

# Try 1: data1 = data1[data1['Food'].isin(['Total fruit']) == True] works
# Try 2: data1 = data1[data1['Food'].isin(['Apple, Total']) == True] doesn't work
# Try 3: data1 = data1.iloc[[1]] returns Apples, Total but not appropriate to use integer index
# Try 4: data1[data1['Food'] == 'Berries'] doesn't work

到目前为止,根据thisthishere 等答案,我只能返回 Food = "Total fruit" 的第一个索引。当我尝试上面的其他方法时,我只得到列名,例如:

Food    Men Women

我是 pandas 的新手,看不出哪里出错了。为什么我可以提取 Food == Total fruit 但不能提取其他任何内容的第一行?

【问题讨论】:

    标签: python-3.x pandas jupyter-notebook


    【解决方案1】:

    对我来说效果很好,可能是一些空格有问题 - 通过strip 删除它们:

    print (data1.Food.tolist())
    ['Total fruit', 'Apples, Total ', 'Apples as fruit', 
    'Apple juice', 'Bananas', ' Berries', 'Grapes']
    
    data1['Food'] = data1['Food'].str.strip()
    
    print (data1.Food.tolist())
    ['Total fruit', 'Apples, Total', 'Apples as fruit', 
    'Apple juice', 'Bananas', 'Berries', 'Grapes']
    
    data2 = data1[data1['Food'].isin(['Total fruit'])]
    print (data2)
              Food   Men  Women
    0  Total fruit  86.2  88.26
    
    data3 = data1[data1['Food'].isin(['Apples, Total'])]
    print (data3)
                Food    Men  Women
    1  Apples, Total  89.01  89.66
    
    data3 = data1[data1['Food'].isin(['Berries'])]
    print (data3)
          Food    Men  Women
    5  Berries  84.21  81.73
    

    【讨论】:

      【解决方案2】:

      使用此代码

      data1= pd.read_excel('USFoodCommodity.xls', sheetname='94-98 FAH', skiprows=76,skip_footer=142, parse_cols='A, H, K')
      list_of_strings_to_match = ['Total fruit', 'Berries', 'Grape']
      for index, row in data1.iterrows():
         if row['Food'] in list_of_strings_to_match:
            print row
      

      【讨论】:

      • 没有浆果或葡萄的行结果
      【解决方案3】:

      这个问题可能很老,但这里有一种更简单直观的方法。

      注意:此解决方案仅适用于 pandas >= 0.13

      您现在可以使用.query() 方法从数据框中选择列。

      很简单:

      df.query('column == value') # The comparison operator can be anything.
      

      例如,在您的情况下,您可以这样查询:

      data1.query('Food == "Total Fruit"')
      

      data1.query('Food == Berries')
      

      要访问变量,请使用@

      fruit = "berries"
      data1.query('Food == @fruit')
      

      您甚至可以使用& 组合多个条件。

      data1.query('condition1 == value1 & condition2 == value2')
      

      希望它有所帮助。

      【讨论】:

        猜你喜欢
        • 2016-08-09
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-03-24
        • 1970-01-01
        • 2018-08-05
        相关资源
        最近更新 更多