【问题标题】:python scipy griddata does not do linear interpolation as expectedpython scipy griddata没有按预期进行线性插值
【发布时间】:2020-08-19 15:38:45
【问题描述】:

这是我的数据:

date_num,expiry_num,strike,value,interp
731988,731988,0.02501,0.0095094,0.0095094
731988,731988,0.03001,0.0091658,0.0096807
731988,731988,0.03501,0.0089164,0.009852
731988,731988,0.03751,0.0088471,0.00993765
731988,731988,0.04001,0.0088244,0.0100233
731988,731988,0.04251,0.008853,0.01010895
731988,731988,0.04501,0.00898,0.0101946
731988,731988,0.04751,0.009066,0.01028025
731988,731988,0.05001,0.0092429,0.0103659
731988,731988,0.05251,0.009458,0.01045155
731988,731988,0.05501,0.0097043,0.0105372
731988,731988,0.06001,0.010264,0.0107085
731988,731988,0.06501,0.0108798,0.0108798
731988,732018,0.02501,0.0095094,0.0095094
731988,732018,0.03001,0.0091658,0.0096807
731988,732018,0.03501,0.0089164,0.009852
731988,732018,0.03751,0.0088471,0.00993765
731988,732018,0.04001,0.0088244,0.0100233
731988,732018,0.04251,0.008853,0.01010895
731988,732018,0.04501,0.00898,0.0101946
731988,732018,0.04751,0.009066,0.01028025
731988,732018,0.05001,0.0092429,0.0103659
731988,732018,0.05251,0.009458,0.01045155
731988,732018,0.05501,0.0097043,0.0105372
731988,732018,0.06001,0.010264,0.0107085
731988,732018,0.06501,0.0108798,0.0108798
731988,732079,0.02543,0.0094153,0.0094153
731988,732079,0.03043,0.0090666,0.009585463
731988,732079,0.03543,0.0088118,0.009755625
731988,732079,0.03793,0.0087399,0.009840706
731988,732079,0.04043,0.0087152,0.009925788
731988,732079,0.04293,0.0087425,0.010010869
731988,732079,0.04543,0.0088643,0.01009595
731988,732079,0.04793,0.0089551,0.010181031
731988,732079,0.05043,0.0091326,0.010266113
731988,732079,0.05293,0.0093489,0.010351194
731988,732079,0.05543,0.0095964,0.010436275
731988,732079,0.06043,0.0101587,0.010606438
731988,732079,0.06543,0.0107766,0.0107766
731988,732170,0.02597,0.0095394,0.0095394
731988,732170,0.03097,0.0091987,0.009711525
731988,732170,0.03597,0.0089515,0.00988365
731988,732170,0.03847,0.008883,0.009969713
731988,732170,0.04097,0.008861,0.010055775
731988,732170,0.04347,0.0088902,0.010141838
731988,732170,0.04597,0.0090131,0.0102279
731988,732170,0.04847,0.0091035,0.010313963
731988,732170,0.05097,0.0092803,0.010400025
731988,732170,0.05347,0.0094953,0.010486088
731988,732170,0.05597,0.0097414,0.01057215
731988,732170,0.06097,0.0103008,0.010744275
731988,732170,0.06597,0.0109164,0.0109164
731988,732353,0.04685,0.0091422,0.0091422

这是我的脚本:

import pandas as pd
from scipy.interpolate import griddata
df = pd.read_csv("base_data.csv")
df["interp"] = griddata(
    df[["expiry_num","strike"]].values, 
    df["value"].values,df[["expiry_num","strike"]].values, 
    method='linear')

import matplotlib.pyplot as plt
plt.scatter(df.loc[df["expiry_num"] == 732018,"strike"],df.loc[df["expiry_num"] == 732018,"value"])
plt.scatter(df.loc[df["expiry_num"] == 732018,"strike"],df.loc[df["expiry_num"] == 732018,"interp"])
plt.show()

结果如下所示:

griddata 怎么没有执行插值?

【问题讨论】:

  • 这能回答你的问题吗? Python/Scipy 2D Interpolation (Non-uniform Data)
  • 不是真的没有。为什么 griddata 没有做它应该做的事情,即插值不规则的网格数据(超过 2d)?
  • 没有,但蓝点之间应该是线性的。至少橙色点应该与蓝色点匹配,因为它们是输入。
  • 我认为问题可能出在对 griddata 的调用中。你为什么使用griddata(df[["expiry_num","strike"]].values?这不应该只是一个一维数组griddata(df["expiry_num"].values 作为points 中的docs.scipy.org/doc/scipy/reference/generated/…
  • 我的输入是 expry,strike,value 所以要进行插值,我需要将它传递给 expiry,strike

标签: python scipy interpolation


【解决方案1】:
  • 这些数据似乎是一维的,y=f(x),而不是多维的,z=f(x, y)
  • 插值的重点是根据现有信息创建新信息。因此,内插数据的长度比现有数据长。
    • 在本例中,num=41 表示已使用插值函数创建了 41 个数据点,该函数使用了原始的 13 个点。
  • 您似乎有兴趣按expiry_num 对数据进行分组,尽管此示例中的所有数据都是相同的
    • 我将创建一个dict of dataframesexpiry_num 作为keys
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d

# using your data, but not the one row of expiry_num = 732353
df_dict = {key: df[['strike', 'value']][df.expiry_num == key] for key in df.expiry_num.unique()}

# plot
plt.figure(figsize=(10, 10))

for i, (k, v) in enumerate(df_dict.items(), start=221):
    plt.subplot(i)

    # interpolate function
    f = interp1d(v.strike, v.value, kind='cubic')

    # create x-axis values, num can be as many points as you want
    xnew = (np.linspace(v.strike.min(), v.strike.max(), num=41, endpoint=True))

    # calculate y values
    ynew = f(xnew)

    # plot
    plt.plot(v.strike, v.value, 'o', xnew, ynew, '--')
    plt.legend(['data', 'cubic'], loc='best')
    plt.title(f'expiry_num: {k}')

注释寻址 cmets:

  • 以下行:
df["interp"] = griddata(df[["expiry_num","strike"]].values, df["value"].values, df[["expiry_num","strike"]].values, method='linear')
  • ...不正确,因为df[["expiry_num","strike"]].values 没有提供任何新值来计算,更重要的是,插值函数不依赖于expiry_num
    • 例如,如果xnew = exp_732018.strike,那么test = exp_732018.value
  • griddata 默认为 interp1d 1-d
from scipy.interpolate import griddata

exp_732018 = df[['strike', 'value']][df.expiry_num == 732018]

# 41 x-values to calculate
xnew = (np.linspace(exp_732018.strike.min(), exp_732018.strike.max(), num=41, endpoint=True))

# 41 new y-values
test = griddata(exp_732018.strike.values, exp_732018.value.values, xnew, method='linear')

# plot
plt.scatter(xnew, test, label='griddata')
plt.scatter(exp_732018.strike.values, exp_732018.value.values, label='existing data')
plt.legend()
plt.ylim(0.008, 0.012)
plt.show()

使用的样本数据

date_num,expiry_num,strike,value
731988,731988,0.02501,0.0095094
731988,731988,0.030010000000000002,0.009165799999999998
731988,731988,0.03501,0.0089164
731988,731988,0.03751,0.0088471
731988,731988,0.040010000000000004,0.0088244
731988,731988,0.04251,0.008853
731988,731988,0.04501,0.00898
731988,731988,0.047510000000000004,0.009066
731988,731988,0.05001,0.0092429
731988,731988,0.05251,0.009458
731988,731988,0.05501,0.009704299999999999
731988,731988,0.06001,0.010264
731988,731988,0.06501,0.010879799999999998
731988,732018,0.02501,0.0095094
731988,732018,0.030010000000000002,0.009165799999999998
731988,732018,0.03501,0.0089164
731988,732018,0.03751,0.0088471
731988,732018,0.040010000000000004,0.0088244
731988,732018,0.04251,0.008853
731988,732018,0.04501,0.00898
731988,732018,0.047510000000000004,0.009066
731988,732018,0.05001,0.0092429
731988,732018,0.05251,0.009458
731988,732018,0.05501,0.009704299999999999
731988,732018,0.06001,0.010264
731988,732018,0.06501,0.010879799999999998
731988,732079,0.02543,0.0094153
731988,732079,0.030430000000000002,0.0090666
731988,732079,0.03543,0.0088118
731988,732079,0.03793,0.0087399
731988,732079,0.04043,0.0087152
731988,732079,0.04293,0.0087425
731988,732079,0.04543,0.008864299999999999
731988,732079,0.04793,0.0089551
731988,732079,0.05043,0.009132600000000001
731988,732079,0.05293,0.009348899999999999
731988,732079,0.05542999999999999,0.0095964
731988,732079,0.06043,0.0101587
731988,732079,0.06543,0.0107766
731988,732170,0.02597,0.0095394
731988,732170,0.030969999999999998,0.0091987
731988,732170,0.03597,0.0089515
731988,732170,0.03847,0.008883
731988,732170,0.04097,0.008860999999999999
731988,732170,0.04347,0.008890200000000001
731988,732170,0.04597,0.0090131
731988,732170,0.04847,0.0091035
731988,732170,0.05097,0.0092803
731988,732170,0.053470000000000004,0.009495299999999998
731988,732170,0.055970000000000006,0.0097414
731988,732170,0.06097,0.010300799999999999
731988,732170,0.06597,0.0109164

【讨论】:

  • 我想要一个带有 griddata 的解决方案。为什么我不能传递整个数据框,而 griddata 会计算出来?我也会在 expiry_num 之间进行插值,所以它实际上至少是 2d。
  • 我用那个特定的过期时间作为 griddata 没有做它的工作的一个例子。我真正的用例将是一个不在源数据框中的过期时间。
  • 感谢关于我的评论的注释 - 它们确实解决了为什么 griddata 在我的情况下不进行插值的问题(没有给出新的观点)。你介意给我你为 (732107, 0.0492) 获得的插值吗?
  • 我仍然对为什么存在这种行为感到有些困惑——如果你问它,为什么它不把你用作基础数据的值还给你呢?在计算过程中,您可能会遇到表面上的现有点。在那种情况下,griddata 不会返回用于设置插值的实际数据?
  • 太晚了,忘了写部分句子……而且,更重要的是,插值函数不依赖于expiry_num。另外,例如,如果xnew = exp_732018.strike,那么test = exp_732018.value;插值函数将为数据中包含的 x 值生成预期输出。
猜你喜欢
  • 2011-09-17
  • 1970-01-01
  • 2012-03-28
  • 1970-01-01
  • 2014-02-17
  • 1970-01-01
  • 2013-01-26
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多