【问题标题】:Pandas: fill nan using previous value and interpolatingPandas:使用以前的值填充 nan 并进行插值
【发布时间】:2018-03-17 14:17:31
【问题描述】:

我有以下数据框df

    time            col_A
0   1520582580.000  79.000
1   1520582880.000  22.500
2   1520583180.000  29.361
3   1520583480.000  116.095
4   1520583780.000  19.972
5   1520584080.000  36.857
6   1520584380.000  15.167
7   1520584680.000  nan
8   1520584980.000  nan
9   1520585280.000  nan
10  1520585580.000  34.500
11  1520585880.000  17.583
12  1520586180.000  nan
13  1520586480.000  48.833
14  1520586780.000  18.806
15  1520587080.000  18.583

col_A 缺少一些数据。我想创建一个col_B,它为每条缺失的记录取前一个值。即

6   1520584380.000  15.167
7   1520584680.000  15.167
8   1520584980.000  15.167
9   1520585280.000  15.167
10  1520585580.000  34.500
11  1520585880.000  17.583
12  1520586180.000  17.583
13  1520586480.000  48.833

和一个col_C,它使用最接近的前后非缺失点进行插值。即

6   1520584380.000  15.167
7   1520584680.000  20.001
8   1520584980.000  24.834
9   1520585280.000  29.667
10  1520585580.000  34.500
11  1520585880.000  17.583
12  1520586180.000  33.208
13  1520586480.000  48.833

除了循环遍历数据帧以逐个记录地进行计算之外,是否有一个内置函数可以用来以优雅的方式实现这一点?谢谢!

【问题讨论】:

    标签: python-3.x pandas dataframe


    【解决方案1】:

    我认为需要ffillinterpolate

    df['colB'] = df['col_A'].ffill()
    df['colc'] = df['col_A'].interpolate()
    print (df)
                time    col_A     colB       colc
    0   1.520583e+09   79.000   79.000   79.00000
    1   1.520583e+09   22.500   22.500   22.50000
    2   1.520583e+09   29.361   29.361   29.36100
    3   1.520583e+09  116.095  116.095  116.09500
    4   1.520584e+09   19.972   19.972   19.97200
    5   1.520584e+09   36.857   36.857   36.85700
    6   1.520584e+09   15.167   15.167   15.16700
    7   1.520585e+09      NaN   15.167   20.00025
    8   1.520585e+09      NaN   15.167   24.83350
    9   1.520585e+09      NaN   15.167   29.66675
    10  1.520586e+09   34.500   34.500   34.50000
    11  1.520586e+09   17.583   17.583   17.58300
    12  1.520586e+09      NaN   17.583   33.20800
    13  1.520586e+09   48.833   48.833   48.83300
    14  1.520587e+09   18.806   18.806   18.80600
    15  1.520587e+09   18.583   18.583   18.58300
    

    如果想使用方法time 进行插值:

    df['time'] = pd.to_datetime(df['time'], unit='s')
    df = df.set_index('time')
    df['colB'] = df['col_A'].ffill()
    df['colc'] = df['col_A'].interpolate('time')
    print (df)
                           col_A     colB       colc
    time                                            
    2018-03-09 08:03:00   79.000   79.000   79.00000
    2018-03-09 08:08:00   22.500   22.500   22.50000
    2018-03-09 08:13:00   29.361   29.361   29.36100
    2018-03-09 08:18:00  116.095  116.095  116.09500
    2018-03-09 08:23:00   19.972   19.972   19.97200
    2018-03-09 08:28:00   36.857   36.857   36.85700
    2018-03-09 08:33:00   15.167   15.167   15.16700
    2018-03-09 08:38:00      NaN   15.167   20.00025
    2018-03-09 08:43:00      NaN   15.167   24.83350
    2018-03-09 08:48:00      NaN   15.167   29.66675
    2018-03-09 08:53:00   34.500   34.500   34.50000
    2018-03-09 08:58:00   17.583   17.583   17.58300
    2018-03-09 09:03:00      NaN   17.583   33.20800
    2018-03-09 09:08:00   48.833   48.833   48.83300
    2018-03-09 09:13:00   18.806   18.806   18.80600
    2018-03-09 09:18:00   18.583   18.583   18.58300
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-12-26
      • 2021-12-14
      • 1970-01-01
      • 2020-09-14
      • 2015-02-14
      • 2022-07-22
      • 2016-12-03
      相关资源
      最近更新 更多