为什么 pandas.to_gbq 函数不尊重我的表模式中的列顺序？答案

【问题标题】：Why is my column order in my table schema not respected by the pandas.to_gbq function?为什么 pandas.to_gbq 函数不尊重我的表模式中的列顺序？
【发布时间】：2021-09-21 13:27:14
【问题描述】：

我想使用 Dataframe.to_gbq() 函数将 panda 数据帧上传到 Big Query。

我指定了一个 table_schema 参数来强制 BigQuery 中的特定列顺序（可能与 Dataframe 不同）。

所以我用例如：

table_schema = [{'name': 'col1', 'type': 'INT64'}, 
{'name': 'col2', 'type': 'STRING'}, 
{'name': 'col3', 'type': 'STRING'}, 
{'name': 'col4', 'type': 'STRING'}, 
{'name': 'col5', 'type': 'STRING'}, 
{'name': 'col6', 'type': 'FLOAT64'}, 
{'name': 'col7', 'type': 'INT64'}, 
{'name': 'col8', 'type': 'FLOAT64'}]

Dataframe.to_gbq(destination_table, if_exists='replace', table_schema=table_schema)

Dataframe 中的列顺序为： Col1, Col3,Col4, Col5, Col2, Col6, Col7,Col8

工作已正确完成。

但是当我在 Big Query 中检查创建（或替换）destination_table 的表架构时，列顺序为： Col1, Col3,Col4, Col5, Col2, Col6, Col7,Col8

（数据帧的顺序而不是 table_schema 的顺序）

不应该遵守表架构中指定的顺序吗？

如果没有，有办法强制吗？

【问题讨论】：

有人回答了你的问题here
@KaBoom 列排序与行排序不同。

标签： python pandas dataframe google-bigquery

【解决方案1】：

通过按您想要的顺序索引数据框的列来重新排序

ordered_columns = [c['name'] for c in table_schema]

Dataframe[ordered_columns].to_gbq(destination_table, if_exists='replace', table_schema=table_schema)

【讨论】：

好的！我真的认为表模式中每一列的字典顺序很重要，但实际上并不重要：数据框中列的顺序保持不变。谢谢！！