在将 Pandas 数据帧插入 BigQuery 表时处理 NaN 值答案

【问题标题】：Handling NaN values while inserting Pandas dataframes into BigQuery tables在将 Pandas 数据帧插入 BigQuery 表时处理 NaN 值
【发布时间】：2019-03-28 02:52:03
【问题描述】：

我正在使用以下代码将具有多个 NaN 值的 Pandas 数据框插入到 BigQuery 表中。数据框在 Cloud Datalab 中准备。

import google.datalab.bigquery as bq

bqtable = ('project_name', 'dataset_name', 'table_name')
table = bq.Table(bqtable)

table_schema = bq.Schema.from_data(df)
table.create(schema = table_schema, overwrite = True)

table.insert(df)

由于数据框中的 NaN 值，我收到以下错误：

RequestException: HTTP request failed: Invalid JSON payload received. 
Unexpected token. : "user_id": NaN,
                               ^

我知道JSON 不理解NaN，但我不能只使用fillna 将这些NaN 值转换为其他值，因为我需要在BigQuery 中将这些字段插入为null桌子。有没有人有解决方法？

【问题讨论】：

标签： python-3.x pandas dataframe google-bigquery google-cloud-datalab

【解决方案1】：

将所有np.nan 值替换为python 的None 值，然后重新运行您的代码（或尝试df.to_gbq）：

df = df.where(pd.notnull(df), None)

我没有使用 Google BigQuery 的经验，并且我认为您现有的代码没有任何问题，但可能值得安装 pandas-gbq 包。然后尝试使用df.to_gbq 将DataFrame 写入GBQ，详见文档：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_gbq.html

【讨论】：

我也尝试过这种方法。这仍然将 NaN 值插入为“NaN”字符串，而不是 null
尝试用 python None 替换所有 np.nan 值。我将编辑我的答案；我认为df = df.where(pd.notnull(df), None) 有效。

【解决方案2】：

如果你的意思是这样的 NULL 列：

如果可能，您可以尝试将列类型更改为 FLOAT 吗？

虽然这确实为您的 user_id 添加了一个 .0，但查询不应该受到它的影响，除非您的 user_id 设置为字符串类型。

【讨论】：