从 .csv 文件中获取数据答案

【问题标题】：Getting data from .csv file从 .csv 文件中获取数据
【发布时间】：2013-05-12 22:57:19
【问题描述】：

我正在做一个 python 项目，我有一个这样的 .csv 文件：

freq,ae,cl,ota
825,1,2,3
835,4,5,6
850,10,11,12
880,22,23,24
910,46,47,48
960,94,95,96
1575,190,191,192
1710,382,383,384
1750,766,767,768

我需要在运行中快速从文件中获取一些数据。
举个例子：

我以 880MHz 的频率进行采样，我想对样本进行一些计算，并利用 .csv 文件的 880 行中的数据。

我通过使用频率冒号作为索引来做到这一点，然后只使用采样频率来获取数据，但棘手的部分是，如果我以 900MHz 进行采样，则会出现错误。我希望它获取上下最近的数据，在本例中为 880 和 910，从这些到行我将使用这些数据对 900MHz 的数据进行线性估计。

我的主要问题是如何快速搜索数据，如果不存在完美匹配如何获取最近的两行？

【问题讨论】：

related/a workaround... 但可能不是你想做的。
不，不是，我不想添加更多数据，该文件作为采样和计算的参考

标签： python csv numpy pandas

【解决方案1】：

bisect module 将在排序后的序列中执行二等分。

【讨论】：

【解决方案2】：

取之前的行/系列和之后的行

In [11]: before, after = df1.loc[:900].iloc[-1], df1.loc[900:].iloc[0]

In [12]: before
Out[12]:
ae     22
cl     23
ota    24
Name: 880, dtype: int64

In [13]: after
Out[13]:
ae     46
cl     47
ota    48
Name: 910, dtype: int64

在中间放一个空行和interpolate（编辑：默认interpolation只是取两者的平均值，所以我们需要设置method='values'）：

In [14]: sandwich = pd.DataFrame([before, pd.Series(name=900), after])

In [15]: sandwich
Out[15]:
     ae  cl  ota
880  22  23   24
900 NaN NaN  NaN
910  46  47   48

In [16]: sandwich.apply(apply(lambda col: col.interpolate(method='values'))
Out[16]:
     ae  cl  ota
880  22  23   24
900  38  39   40
910  46  47   48

In [17]: sandwich.apply(apply(lambda col: col.interpolate(method='values')).loc[900]
Out[17]:
ae     38
cl     39
ota    40
Name: 900, dtype: float64

注意：

df1 = pd.read_csv(csv_location).set_index('freq')

您可以将其包装在某种函数中：

def interpolate_for_me(df, n):
    if n in df.index:
        return df.loc[n]
    before, after = df1.loc[:n].iloc[-1], df1.loc[n:].iloc[0]
    sandwich = pd.DataFrame([before, pd.Series(name=n), after])
    return sandwich.apply(lambda col: col.interpolate(method='values')).loc[n]

【讨论】：

这正是我要找的，谢谢.. 附带评论，我不知道插值函数是如何工作的，但是您使用它的方式是假设 900MHz 在其他两个频率的中间，所以结果不正确。
@Laplace 感谢您告诉我！我假设默认值是线性的（因为它被标记为“线性”）但显然“值”是我们想要的（用于线性插值）。

【解决方案3】：

import csv
import bisect

def interpolate_data(data, value):
    # check if value is in range of the data.
    if data[0][0] <= value <= data[-1][0]: 
        pos = bisect.bisect([x[0] for x in data], value)
        if data[pos][0] == value:
            return data[pos][0]
        else:
            prev = data[pos-1]
            curr = data[pos]
            factor = 1+(value-prev[0])/(curr[0]-prev[0])
            return [value]+[x*factor for x in prev[1:]]

with open("data.csv", "rb") as csvfile:
    f = csv.reader(csvfile)
    f.next() # remove the header
    data = [[float(x) for x in row] for row in f] # convert all to float

# test value 1200:
interpolate_data(data, 1200)
# = [1200, 130.6829268292683, 132.0731707317073, 133.46341463414632]

对我有用，而且相当容易理解。

【讨论】：