【问题标题】:Python, Pandas: Filter rows of data frame based on functionPython,Pandas:根据函数过滤数据框的行
【发布时间】:2018-03-27 01:26:04
【问题描述】:

我正在尝试根据其中一列中的子字符串过滤 python 数据框。

如果ID字段的位置13&14的数字是 9,我想删除该行。

例子:

ABCD-3Z-A93Z-01A-11R-A37O-07 -> 保留

ABCD-3Z-A93Z-11A-11R-A37O-07 -> 掉落

我已经设法得到以下解决方案,但我认为必须有一种更快、更有效的方法。

import pandas as pd

# Enter some data. We want to filter out all rows where the number at pos 13,14 > 9
df = {'ID': ['ABCD-3Z-A93Z-01A-11R-A37O-07', 'ABCD-6D-AA2E-11A-11R-A37O-07', 'ABCD-6D-AA2E-01A-11R-A37O-07',
             'ABCD-A3-3307-01A-01R-0864-07', 'ABCD-6D-AA2E-01A-11R-A37O-07', 'ABCD-6D-AA2E-10A-11R-A37O-07',
             'ABCD-6D-AA2E-09A-11R-A37O-07'],
      'year': [2012, 2012, 2013, 2014, 2014, 2017, 2015]
}
# convert to df
df = pd.DataFrame(df)

# define a function that checks if position 13&15 are > 9.
def filter(x):
    # that, if x is a string,
    if type(x) is str:
        if int(float(x[13:15])) <= 9:
            return True
        else:
            return False
    else:
        return False

# apply function
df['KeepRow'] = df['ID'].apply(filter)
print(df)

# Now filter out rows where "KeepRow" = False
df = df.loc[df['KeepRow'] == True]
print(df)
# drop the column "KeepRow" as we don't need it anymore
df = df.drop('KeepRow', axis=1)
print(df)

【问题讨论】:

    标签: python pandas filter


    【解决方案1】:

    我认为您可以根据字符串的第 13 个符号进行过滤:

    将熊猫导入为 pd

    # Enter some data. We want to filter out all rows where the number at pos 13,14 > 9
    df = pd.DataFrame({
        'ID': ['ABCD-3Z-A93Z-01A-11R-A37O-07',
               'ABCD-6D-AA2E-11A-11R-A37O-07',
               'ABCD-6D-AA2E-01A-11R-A37O-07',
               'ABCD-A3-3307-01A-01R-0864-07',
               'ABCD-6D-AA2E-01A-11R-A37O-07',
               'ABCD-6D-AA2E-10A-11R-A37O-07',
               'ABCD-6D-AA2E-09A-11R-A37O-07'],
        'year': [2012, 2012, 2013, 2014, 2014, 2017, 2015]
    })
    # convert to df
    
    df['KeepRow'] = df['ID'].apply(lambda x: x[13] == '0')
    

    或者简单地说:

    df[df['ID'].apply(lambda x: x[13] == '0')]
    

    【讨论】:

      【解决方案2】:

      使用indexing with str 按位置获取值,然后转换为float 并按boolean indexing 过滤:

      df = df[df['ID'].str[13:15].astype(float) <=9]
      print(df)
                                   ID  year
      0  ABCD-3Z-A93Z-01A-11R-A37O-07  2012
      2  ABCD-6D-AA2E-01A-11R-A37O-07  2013
      3  ABCD-A3-3307-01A-01R-0864-07  2014
      4  ABCD-6D-AA2E-01A-11R-A37O-07  2014
      6  ABCD-6D-AA2E-09A-11R-A37O-07  2015
      

      详情:

      print(df['ID'].str[13:15])
      0    01
      1    11
      2    01
      3    01
      4    01
      5    10
      6    09
      Name: ID, dtype: object
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2017-11-03
        • 1970-01-01
        • 2021-05-16
        • 2021-08-30
        • 2019-07-20
        • 2020-08-29
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多