【问题标题】：Python find index of nan in nan-list yields error only sometimes?Python find index of nan in nan-list 有时只会产生错误？
【发布时间】：2021-11-16 23:44:36
【问题描述】：

对于一个全南列表a = [np.nan, np.nan]，a.index(np.nan) 返回0，而对于b = np.nanmax(a) 返回的np.nan，a.index(b) 给出一个ValueError。 np.nan 和 b 的对象 id 不同。但是，如果a 是[2,3.1] 和c = np.array(a).tolist()，那么id(a[1]) 和id(c[1]) 也会不同，但a.index(c[1]) 没有ValueError？

list.index() 如何在后台工作？它是否比较值相等（我猜不是，否则a.index(np.nan) 应该返回错误，因为np.nan != np.nan）？对于对象 id（我想不是，否则 a.index(c[1]) 应该返回错误）？为什么a.index(np.nanmax(a)) 的示例在a = [np.nan,np.nan] 下不起作用，而a.index(np.nan) 可以？

import numpy as np

a = [np.nan, np.nan]
b = np.nanmax(a)

print(id(np.nan), id(a[0]), id(a[1]), id(b))

a.index(np.nan)
a.index(b)

# Output:
# 47021195940144 47021195940144 47021195940144 47021566155984
#   ...
#   File "<ipython-input-2-fb7cc8fa88c0>", line 9, in <module>
#     a.index(b)
# ValueError: nan is not in list

【问题讨论】：

如果你查看type(np.nan) == type(b)，那么你会得到False：type(np.nan) 是<class 'float'> 而type(b) 是<class 'numpy.float64'>。为什么会发生这种情况：np.nanmax 确实通过a = np.asanyarray(a) 将类似数组的a 转换为ndarray，并且这种转换会产生dtype np.float64 的ndarry，它在输出中被保留。
@Timus 关于不同类型的好点，但我猜作者询问index 的实现@ 即How does list.index() work under the hood?
@0dminnimda 也许，也许不是。 :)) 我怀疑这个问题是由于误解np.nan 和b 只是具有不同的值/本质上是不同的对象，因此b 不是a 的元素，这使得结果a.index(b) 很明显。但谁知道呢......无论如何，我没有将它作为答案发布，只是作为评论 - 我认为这就是 cmets 的用途。
@0dminnimda：是的，我只是想了解为什么float(3.1) 示例有效，尽管在列表-数组-列表转换之后是一个不同的对象，以及为什么np.nan 示例有效，尽管没有与np.nan 相等。您在下面的回答中澄清了这一点。我不认为list.index 会同时测试x is y（参考）和x == y，但考虑到代码输出是合乎逻辑的。

标签： python list numpy nan

【解决方案1】：

`list.index`的实现

如果您想了解index 是如何实现的（在 C 中），您可以查看 here
为了更容易理解，我在 python 中重写了它：

import sys


def index(self, value, start=0, stop=sys.maxsize, /):
    # make sure that start and end are in boundaries
    if start < 0:
        start += len(self)
        if start < 0:
            start = 0
    if stop < 0:
        stop += len(self)
        if stop < 0:
            stop = 0

    # iterate throughout list and try to find the value
    for i, obj in enumerate(self[start:stop]):
        if obj is value or obj == value:
            return i

    raise ValueError("%r is not in list" % value)

为什么会这样实现的细节

要理解这部分，我建议您阅读我之前引用的实现

所有的魔法都发生在PyObject_RichCompareBool:
如果它在index 中被调用，那么它的行为类似于x is y or x == y

docs 中也说明了这一事实（index 使用 Py_EQ）

int PyObject_RichCompareBool(PyObject *o1, PyObject *o2, int opid)

使用opid指定的运算比较o1和o2的值，必须是Py_LT、Py_LE、Py_EQ、Py_NE、Py_GT或Py_GE之一，分别对应，或 >= 分别。错误时返回 -1，如果结果为假则返回 0，否则返回 1。这相当于 Python 表达式 o1 op o2，其中 op 是 opid 对应的运算符。

注意如果 o1 和 o2 是同一个对象，PyObject_RichCompareBool() 将始终为 Py_EQ 返回 1，为 Py_NE 返回 0。

-1 的情况由 python 处理，我们无需担心。（python引发异常并自动停止运行我们的代码）

那么它是如何工作的呢？

最后，如果我们应用我们的知识，那么我们可以看到这种行为的原因：

import numpy as np

instance1 = np.nan

l = [instance1]
instance2 = np.nanmax(l)  # RuntimeWarning: All-NaN axis encountered

print(instance1 is instance2 or instance1 == instance2)
# False therefore ValueError

import numpy as np

instance1 = 3.1

l = [instance1]
instance2 = np.array(l).tolist()[0]

print(instance1 is instance2 or instance1 == instance2)
# True (instance1 == instance2) therefore no ValueError

另外

以下是您的概括示例：

import numpy as np

instance1 = np.nan

l = [instance1]
instance2 = np.nanmax(l)  # RuntimeWarning: All-NaN axis encountered

assert instance1 is l[0]
assert instance1 is not instance2

assert not l.index(instance1)
assert not l.index(instance2)  # ValueError: nan is not in list

和

import numpy as np

instance1 = 3.1

l = [instance1]
instance2 = np.array(l).tolist()[0]

assert instance1 is l[0]
assert instance1 is not instance2

assert not l.index(instance1)
assert not l.index(instance2)  # no ValueError

【讨论】：

@bproxauf 如果这个答案对您有帮助或您喜欢它，请不要忘记vote up and mark the answer as a solution，我将不胜感激。
我总是这样做，只是从昨天开始就没有检查 StackOverflow 的帖子。

【解决方案2】：

在 python 中，您可以使用以下方法创建一个 nan 值对象：

In [80]: mynan=float('nan')
In [81]: id(mynan)
Out[81]: 139640449759024

制作另一个并获得不同的id：

In [82]: mynan=float('nan')
In [83]: id(mynan)
Out[83]: 139640449757264

numpy 有自己的版本：

In [84]: id(np.nan)
Out[84]: 139640952170000

我认为总是给出相同的 id（在特定会话中）

列个清单：

In [85]: a = [.1, np.nan, .3, mynan]

np.isnan 可以测试 nan 值，即使 id 和值不起作用：

In [86]: np.isnan(a)
Out[86]: array([False,  True, False,  True])

据我所知，列表索引首先测试id，然后是==。请记住 reference 列出的存储元素。

In [87]: a.index(np.nan)
Out[87]: 1
In [88]: a.index(mynan)
Out[88]: 3
In [89]: a.index(float('nan'))
Traceback (most recent call last):
  File "<ipython-input-89-33bf9e0279e3>", line 1, in <module>
    a.index(float('nan'))
ValueError: nan is not in list

【讨论】：

list.index的实现

为什么会这样实现的细节

那么它是如何工作的呢？

另外

`list.index`的实现