【问题标题】:Fill NaN values based on operators from another column根据另一列中的运算符填充 NaN 值
【发布时间】:2021-01-22 22:42:36
【问题描述】:

我有一个这样的数据库(pd.DataFrame):

    condition     odometer
0    new           NaN
1    bad           1100
2    excellent     110
3    NaN           200
4    NaN           2000
5    new           20
6    bad           NaN

我想根据“里程表”的值填充“条件”的NaN:

new: odometer >0 and <= 100 
excellent: odometer >100 and <= 1000
bad: odometer >1000

我尝试这样做,但它不起作用:

for i in range(len(database)): 
   if math.isnan(database['condition'][i]) == True:
      odometer = database['odometer'][i] 
      if   odometer > 0 & odometer <= 100:       value = 'new'
      elif odometer > 100 & odometer <= 1000:    value = 'excellent'
      elif odometer > 1000:                      value = 'bad'
      database['condition'][i] = value

也尝试了第一个“if”条件:

database['condition'][i] == np.nan

但效果并不好。

【问题讨论】:

    标签: python dataframe for-loop operators nan


    【解决方案1】:

    您可以使用 DataFrame.apply() 为您的函数生成一个新的条件列,然后替换它。不确定您的列是什么类型。 df['condition'].dtype 会告诉你。看起来条件可能是字符串或对象,这可能会在您的逻辑中产生错误。如果是字符串列,则需要进行直接比较 == 'NaN'。如果它是一个对象,你可以使用 np.nan 或 math.nan。我在下面的每个案例中都包含了一个示例数据库。您可能还想测试里程表列的类型。

    import numpy as np
    import pandas as pd
    
    # condition column as string
    df = pd.DataFrame({'condition':['new','bad','excellent','NaN','NaN','new','bad'], 'odometer':np.array([np.nan, 1100, 110, 200, 2000, 20, np.nan], dtype=object)})
    # condition column as object
    # df = pd.DataFrame({'condition':np.array(['new','bad','excellent',np.nan,np.nan,'new','bad'], dtype=object), 'odometer':np.array([np.nan, 1100, 110, 200, 2000, 20, np.nan], dtype=object)})
    def f(database):
        if database['condition'] == 'NaN':
        #if np.isnan(database['condition']):
            odometer = database['odometer'] 
            if   odometer > 0 & odometer <= 100:       value = 'new'
            elif odometer > 100 & odometer <= 1000:    value = 'excellent'
            elif odometer > 1000:                      value = 'bad'
            return value
        return database['condition']
    
    df['condition'] = df.apply(f, axis=1)
    

    【讨论】:

      【解决方案2】:

      我有一个不错的单班轮解决方案:

      让我们创建一个示例数据框:

      import pandas as pd
      
      df = pd.DataFrame({'condition':['new','bad',None,None,None], 'odometer':[None,1100,50,500,2000]})
      df
      Out:    
        condition odometer
      0   new     NaN
      1   bad     1100.0
      2   None    50.0
      3   None    500.0
      4   None    2000.0
      

      解决方案:

      df.condition = df.condition.fillna(df.odometer.apply(lambda number: 'new' if number in range(101) else 'excellent' if number in range(101,1000) else 'bad'))
      df
      Out:    
        condition  odometer
      0       new        NaN
      1       bad     1100.0
      2       new       50.0
      3 excellent      500.0
      4       bad     2000.0
      

      【讨论】:

        猜你喜欢
        • 2021-01-24
        • 2020-10-08
        • 1970-01-01
        • 2022-11-21
        • 1970-01-01
        • 2015-10-05
        • 1970-01-01
        • 1970-01-01
        • 2021-03-16
        相关资源
        最近更新 更多