为什么我得到 NaN 而不是变量？答案

【问题标题】：Why am I getting NaN instead of variable?为什么我得到 NaN 而不是变量？
【发布时间】：2020-06-17 20:00:08
【问题描述】：

早上好，

我想问你为什么我从这段代码中得到 Nan？我有一个数据框，其中只有 4 列：flightID、时间戳、X 和 Y。

对于每个航班，我都有几行具有不同的时间戳和 x,y 位置。我想要的是计算从每个 x,y 坐标飞行的时间。然后我想比较每个 x,y 坐标的飞行时间，并且每个 x,y 只有最小的时间。我希望代码写得好，但是在最后的 min_time 数组中我有一堆用于 x 或 y 的 NaN，你能告诉我为什么吗？

我添加了一些代码来创建类似于我的数据框，因此该示例是可重现的。

data = {'flightID':['11111', '11111', '11111', '11111','2222','2222','2222','3333','3333','3333','3333'], 'timestamp':[1519669804, 1519669844,  1519669884, 1519669924,1519669976,1519679614,1519679615,1519679616,1519679800,1519679876,1519679999],'X':[1,1,1,1,2,3,4,4,4,5,6],'Y':[7,7,7,7,7,7,7,8,8,8,9]} 

Grid_frame2 = pd.DataFrame(data)




    # finding the cells which has something 
flight = []
min_time=[]

for j in range(len(Grid_frame2)-1):
    if Grid_frame2.flightID[j] == Grid_frame2.flightID[j+1]:        # find all the rows from the same flight
        arr = [Grid_frame2.timestamp[j]]
        arr.append(Grid_frame2.X[j])
        arr.append(Grid_frame2.Y[j])
        flight = np.reshape(flight,(-1,3))
        flight = np.vstack((flight,arr))
        arr = []
    else:                                                         # if you have the last one, compute time flown
        time = flight[-1][0] - flight[0][0]
        time = abs(time)
        x = flight[0][1]
        y = flight[0][2]
        if len(min_time) == 0:                                    # if min_time array is empty, insert values
            arr = [time]
            arr.append(x)
            arr.append(y)
            min_time.append(arr)
            arr = []
            flight = []
        else:                                                     # is it is not empty, check if there is the same cell and if it is not smaller value
            for k in range(len(min_time)):
                if min_time[k][1] == x and min_time[k][2] == y and min_time[k][0] > int(time):
                    min_time[k][0] = time
                    flight= []
                elif min_time[k][1] == x and min_time[k][2] == y and min_time[k][0] < int(time):
                    flight = []
                    pass
                else:                                             # if there is no same cell or the value isn't higher, insert values
                    arr = [time]
                    arr.append(x)
                    arr.append(y)
                    min_time = np.vstack((min_time,arr))
                    #min_time = np.reshape(min_time,(-1,3))
                    arr = []
                    flight = []

问题是，如果我尝试在 min_time 数组中查看这个循环之后是什么，它看起来像这样：

> array([[691.,   1.,   7.],
       [812.,  nan,   7.],
       [898.,   6.,  nan],
       ...,
       [769.,  nan,   9.],
       [769.,  nan,   9.],
       [769.,  nan,   9.]])

另外，长度应该是 150，因为我有一个 x,y 坐标为 (10x15) 的网格，长度实际上超过 1000

【问题讨论】：

嗨，露西。您正在使用 Pandas，这很棒，但请注意，遍历 pandas 数据框绝对是您的最后选择。它们的设计使您可以执行大量功能而无需循环它们。也就是说，我刚刚运行了您的代码，但 min_time 没有得到相同的输出；我明白了：array([[8.000e+01, 1.000e+00, 7.000e+00], [9.638e+03, 2.000e+00, 7.000e+00]])
您应该尝试编辑您的问题，更清楚地描述您正在尝试做什么，包括您的输出应该是什么样子的示例，这样可以更轻松地为您提供帮助。跨度>
非常感谢您的建议，我完全是初学者，两周前开始编码。我想在 Pandas 中使用这些东西，但这对我来说更容易实现。实际上，我想通了，我使用的原始数据框有问题。现在唯一的事情是它会在 min_time 中添加一个具有相同 x,y 的新行，即使它已经在里面了

标签： arrays dataframe for-loop nan

【解决方案1】：

我阅读了您的代码，发现这里有一个问题：

if min_time[k][1] == x and min_time[k][2] == y and min_time[k][0] > int(time):
    ...
elif min_time[k][1] == x and min_time[k][2] == y and min_time[k][0] < int(time):
    ...

如果min_time[k][0] == int(time) 怎么办？额外的一行可能来自这里。

但是，即使在此处调整后，代码的行为也很奇怪（我不确定您要做什么，需要更多信息）

我希望我理解你的代码以及你在这里做什么，下面是一些建议或提示。

让您的代码更加优雅和可读。

# Yes
min_time.append([time,x,y])

# No
arr = [time]
arr.append(x)
arr.append(y)
min_time.append(arr)
arr = []

# Yes
flight.append(Grid_frame2.loc[j,['timestamp','X','Y']])

# No
arr = [Grid_frame2.timestamp[j]]
arr.append(Grid_frame2.X[j])
arr.append(Grid_frame2.Y[j])
flight = np.reshape(flight,(-1,3))
flight = np.vstack((flight,arr))
arr = []

运行后你会发现pandas的魅力和魔力（我猜你可能想这样做）：

def get_flight_info(x):
    time = x['timestamp'].iloc[-1]-x['timestamp'].iloc[0]
    location = x[['X','Y']].iloc[0]
    return location.append(pd.Series([time],index=['min_time']))

flight_info = Grid_frame2.groupby(['flightID']).apply(get_flight_info)
min_time_in_location = flight_info.groupby(['X','Y'])['min_time'].min()

【讨论】：