在合并列中将两个数据帧与 pd.NA 合并会产生“TypeError：NA 的布尔值不明确”答案

【问题标题】：Merging two dataframes with pd.NA in merge column yields 'TypeError: boolean value of NA is ambiguous'在合并列中将两个数据帧与 pd.NA 合并会产生“TypeError：NA 的布尔值不明确”
【发布时间】：2020-02-18 12:01:55
【问题描述】：

使用 Pandas 1.0.1，如果

df = df.merge(df2, on=some_column)

产量

File /home/torstein/code/fintechdb/Sheets/sheets/gild.py, line 42, in gild
    df = df.merge(df2, on=some_column)
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py, line 7297, in merge
    validate=validate,
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 88, in merge
    return op.get_result()
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 643, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 862, in _get_join_info
    (left_indexer, right_indexer) = self._get_join_indexers()
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 841, in _get_join_indexers
    self.left_join_keys, self.right_join_keys, sort=self.sort, how=self.how
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 1311, in _get_join_indexers
    zipped = zip(*mapped)
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 1309, in <genexpr>
    for n in range(len(left_keys))
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 1918, in _factorize_keys
    rlab = rizer.factorize(rk)
File pandas/_libs/hashtable.pyx, line 77, in pandas._libs.hashtable.Factorizer.factorize
File pandas/_libs/hashtable_class_helper.pxi, line 1817, in pandas._libs.hashtable.PyObjectHashTable.get_labels
File pandas/_libs/hashtable_class_helper.pxi, line 1732, in pandas._libs.hashtable.PyObjectHashTable._unique
File pandas/_libs/missing.pyx, line 360, in pandas._libs.missing.NAType.__bool__

TypeError: boolean value of NA is ambiguous

虽然这有效：

df[some_column].fillna(np.nan, inplace=True)
df2[some_column].fillna(np.nan, inplace=True)
df = df.merge(df2, on=some_column)
# Works

如果相反，我会这样做

df[some_column].fillna(pd.NA, inplace=True)

然后错误返回。

【问题讨论】：

你好，你能解决我面临同样问题的问题吗
@ElenaGT 在合并前对带有pd.NA 的列执行fillna(np.nan)。

标签： python python-3.x pandas

【解决方案1】：

这与pd.NA 在pandas 1.0.0 中实现以及pandas 团队如何决定它应该在布尔上下文中工作有关。此外，您考虑到它是一个实验性功能，因此它不应该用于实验之外的任何事情：

警告实验：pd.NA 的行为仍然可以在没有警告的情况下更改。

在另一个涉及 working with missing values 的 pandas 文档链接中，我相信可以找到您正在寻找的原因和答案：

布尔上下文中的NA：由于 NA 的实际值是未知的，因此将 NA 转换为布尔值是不明确的。以下引发错误：TypeError: boolean value of NA is ambiguous

此外，它还提供了一条有价值的建议：

"这也意味着 pd.NA 不能在它被评估为布尔值的上下文中使用，例如 if condition: ... where condition 可能是 pd.NA。在这种情况下，isna() 可以用于检查 pd.NA 或条件是 pd.NA 可以避免，例如通过预先填充缺失值。"

【讨论】：

【解决方案2】：

我认为我的数据中的 pd.NA 实例是有效的，因此我需要处理它们而不是填充它们，就像使用 fillna() 一样。如果您在这种情况下像我一样，那么只需使用pd.isna(val) 将其从pd.NA 转换为True 或False。只有你可以决定 null 应该是 T 还是 F，但这里有一个简单的例子：

val = pd.NA
if pd.isna(val) :
    print('it is null')
else :
    print('it is not null')

返回：it is null

那么，

val = 7
if pd.isna(val) :
    print('it is null')
else :
    print('it is not null')

返回：it is not null

希望这有助于其他人试图获得明确的行动方案（Celius 的回答是准确的，但我想为那些为此苦苦挣扎的人提供可操作的代码）。

【讨论】：