【问题标题】:Pandas Remove a row when a particular kind of value appears in a column当特定类型的值出现在列中时,熊猫删除一行
【发布时间】:2016-05-01 03:33:27
【问题描述】:

我有这样的 DF

         UNIT  EXITSn_hourly           Interval
1867     R081            104  00:00:00-04:00:00
1868     R081              0  04:00:00-04:00:00
1869     R081            129  04:00:00-08:00:00
1870     R081            521  08:00:00-12:00:00
1871     R081           1048  12:00:00-16:00:00
2838     R032             38  00:00:00-04:00:00
2839     R032              0  04:00:00-04:00:00
2840     R032             89  04:00:00-08:00:00
2841     R032            470  08:00:00-12:00:00

当 Interval 具有这种特定格式时,我需要删除整行

1868     R081              0  04:00:00-04:00:00

我不仅想删除04:00:00-04:00:00,还想删除类似的值,例如

01:00:00-01:00:00

其实这是我原来的df。我创建了一个间隔

    C/A  UNIT       SCP     DATEn     TIMEn    DESCn  ENTRIESn   EXITSn
0  A002  R051  02-00-00  06-29-13  00:00:00  REGULAR   4174592  1433672
1  A002  R051  02-00-00  06-29-13  04:00:00  REGULAR   4174628  1433675
2  A002  R051  02-00-00  06-29-13  08:00:00  REGULAR   4174641  1433706
3  A002  R051  02-00-00  06-29-13  12:00:00  REGULAR   4174741  1433775
4  A002  R051  02-00-00  06-29-13  16:00:00  REGULAR   4174936  1433826
5  A002  R051  02-00-00  06-29-13  20:00:00  REGULAR   4175270  1433877
6  A002  R051  02-00-00  06-30-13  00:00:00  REGULAR   4175403  1433908
7  A002  R051  02-00-00  06-30-13  04:00:00  REGULAR   4175441  1433914
8  A002  R051  02-00-00  06-30-13  08:00:00  REGULAR   4175457  1433928
9  A002  R051  02-00-00  06-30-13  12:00:00  REGULAR   4175520  1433981

我使用此代码创建了间隔

import copy

df = copy.deepcopy(turnstile_data)
pdf = df.shift(periods=1)

df['ENTRIESn_hourly'] = df['ENTRIESn'] - pdf['ENTRIESn'].fillna(0)
df['EXITSn_hourly'] = df['EXITSn'] - pdf['EXITSn'].fillna(0)
df['Interval'] = pdf['TIMEn']+'-'+ df['TIMEn'].fillna(0)
df.loc[(df['ENTRIESn'] == 0), 'ENTRIESn_hourly'] = 0
df.loc[(df['EXITSn'] == 0), 'EXITSn_hourly'] = 0
df.loc[(df['C/A'] != pdf['C/A']) | (df['UNIT'] != pdf['UNIT']) | (df['SCP'] != pdf['SCP']), ['ENTRIESn_hourly', 'EXITSn_hourly','Interval']] = 0

df = df[df.Interval != 0]
print df.head(20)

head7=copy.deepcopy(df)
required_df=head7[['UNIT','EXITSn_hourly','Interval']].groupby(head7.UNIT)
print required_df.head(5)

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    您可以比较部分字符串,然后按子集删除它们:

    print df.Interval.str[0:2]
    1867    00
    1868    04
    1869    04
    1870    08
    1871    12
    2838    00
    2839    04
    2840    04
    2841    08
    Name: Interval, dtype: object
    
    print df.Interval.str[0:2] != df.Interval.str[9:11]
    1867     True
    1868    False
    1869     True
    1870     True
    1871     True
    2838     True
    2839    False
    2840     True
    2841     True
    Name: Interval, dtype: bool
    
    print df[df.Interval.str[0:2] != df.Interval.str[9:11]]
          UNIT  EXITSn_hourly           Interval
    1867  R081            104  00:00:00-04:00:00
    1869  R081            129  04:00:00-08:00:00
    1870  R081            521  08:00:00-12:00:00
    1871  R081           1048  12:00:00-16:00:00
    2838  R032             38  00:00:00-04:00:00
    2840  R032             89  04:00:00-08:00:00
    2841  R032            470  08:00:00-12:00:00
    

    编辑:

    我检查了您的代码,也许您可​​以省略 copy.deepcopy 并使用 copy

    df = turnstile_data.copy(deep=True)
    
    df['ENTRIESn_hourly'] = (df['ENTRIESn'] - df['ENTRIESn'].shift(periods=1)).fillna(0)
    df['EXITSn_hourly'] = (df['EXITSn'] - df['EXITSn'].shift(periods=1)).fillna(0)
    df['Interval'] = (df['TIMEn'].shift(periods=1)+'-'+ df['TIMEn']).fillna(0)
    
    df.loc[(df['ENTRIESn'] == 0), 'ENTRIESn_hourly'] = 0
    df.loc[(df['EXITSn'] == 0), 'EXITSn_hourly'] = 0
    df.loc[(df['C/A'] != df['C/A'].shift(periods=1)) | 
           (df['UNIT'] != df['UNIT'].shift(periods=1)) | 
           (df['SCP'] != df['SCP'].shift(periods=1)), 
    ['ENTRIESn_hourly', 'EXITSn_hourly','Interval']] = 0
    
    print df.head(5)
       ENTRIESn_hourly  EXITSn_hourly           Interval  
    0                0              0                  0  
    1               36              3  00:00:00-04:00:00  
    2               13             31  04:00:00-08:00:00  
    3              100             69  08:00:00-12:00:00  
    4              195             51  12:00:00-16:00:00  
    
    required_df=df[['UNIT','EXITSn_hourly','Interval']].groupby(df.UNIT)
    
    print required_df.head(5)
       UNIT  EXITSn_hourly           Interval
    0  R051              0                  0
    1  R051              3  00:00:00-04:00:00
    2  R051             31  04:00:00-08:00:00
    3  R051             69  08:00:00-12:00:00
    4  R051             51  12:00:00-16:00:00
    

    【讨论】:

    • 这是有效的方法还是有更好的方法
    【解决方案2】:

    可能您想将 Interval 拆分为 Interval_start 和 Interval_end 并检查它们是否相等:

    df['Interval_start'] = df['Interval'].map(lambda s: s.split('-')[0])
    df['Interval_end'] = df['Interval'].map(lambda s: s.split('-')[1])
    df.query("Interval_start != Interval_end")
    
          UNIT  EXITSn_hourly           Interval Interval_start Interval_end
    1867  R081            104  00:00:00-04:00:00       00:00:00     04:00:00
    1869  R081            129  04:00:00-08:00:00       04:00:00     08:00:00
    1870  R081            521  08:00:00-12:00:00       08:00:00     12:00:00
    1871  R081           1048  12:00:00-16:00:00       12:00:00     16:00:00
    2838  R032             38  00:00:00-04:00:00       00:00:00     04:00:00
    2840  R032             89  04:00:00-08:00:00       04:00:00     08:00:00
    2841  R032            470  08:00:00-12:00:00       08:00:00     12:00:00
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-11-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-11-07
      • 1970-01-01
      • 2019-04-14
      • 2021-04-01
      相关资源
      最近更新 更多