追加到云上现有表时出现 Python 和雪花错误答案

【问题标题】：Python and Snowflake error on appending into existing table on the cloud追加到云上现有表时出现 Python 和雪花错误
【发布时间】：2021-03-07 08:16:07
【问题描述】：

我正在尝试将数据框上传到雪花云中的现有表中。这是数据框：

columns_df.head()

现在使用 pandas 中的 to_sql() 将数据附加到现有表中时：

columns_df.to_sql('survey_metadata_column_names', index=False,  index_label=None, con=conn, schema='PUBLIC', if_exists='append', chunksize=300)

我收到以下错误：

DatabaseError: sql 'SELECT name FROM sqlite_master 执行失败 WHERE type='table' AND name=?;': 并非所有参数都在期间转换字符串格式化

TypeError：字符串格式化期间并非所有参数都转换

某些列名包含短划线和下划线。

【问题讨论】：

这应该会有所帮助。 stackoverflow.com/questions/63675368/…

标签： python snowflake-cloud-data-platform pandas-to-sql

【解决方案1】：

来自snowflake documentation。

要将数据从 Pandas DataFrame 写入 Snowflake 数据库，请执行以下操作之一：

调用 write_pandas() 函数。

调用 pandas.DataFrame.to_sql() 方法，并指定 pd_writer 作为用于将数据插入数据库的方法。

注意第二点。 write_pandas。我仍然注意到使用这两种方法的几个问题，但这些是官方的解决方案。

from snowflake.connector.pandas_tools import pd_writer
columns_df.to_sql('survey_metadata_column_names', 
                 index = False,  
                 index_label = None, 
                 con = Engine, #Engine should be an SQLAlchemy engine 
                 schema = 'PUBLIC', 
                 if_exists = 'append', 
                 chunksize = 300,
                 method = pd_writer)

或者

from snowflake.connector.pandas_tools import write_pandas
con = snowflake.connector.connect(...)
success, nchunks, nrows, _ = write_pandas(con, 
                                          columns_df, 
                                          'survey_metadata_column_names', 
                                          chunk_size = 300, 
                                          schema = 'PUBLIC')

请注意，第一种方法需要 SQLAlchemy 引擎，而第二种方法可以使用常规连接。

【讨论】：

您可以在临时表中使用大 df 进行测试。 con.execute('create temporary table ...') 和 timeit。我希望它们几乎是等效的，并且它们使用相同的方法来拆分DataFrame，将其保存到磁盘中的临时文件中，使用PUT @file 将文件上传到临时存储中，然后使用COPY INTO。如果是这种情况，write_pandas 可能会稍微提高性能，因为文档提示pd_writer 没有 chunksize 参数（pandas 可能在调用函数之前拆分帧）。
从第二种方法我得到这个错误：The following error occured: 090106 (22000): Cannot perform CREATE STAGE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.
我建议在打开连接时将架构添加到您的连接中，如果这是您的错误。 .connect(..., schema = ..., database = ..., warehouse = ...)。在不知道确切代码的情况下，我无法具体说明您的情况出了什么问题。此外，如果您可以访问雪花帮助中心，那么在那里提出问题将确保调查任何错误（可能在遥远的将来的某个时候）。我们有比使用它们的内置函数更具体的需求，因此我们在内部编写了我们自己的write_pandas 版本（遗憾的是我无法提供链接）。
好的，现在有这个错误：Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. pyarrow or fastparquet is required for parquet support
尝试安装但缺少文件（大量缺少的依赖项）

【解决方案2】：

查看我在此处发布的允许写入（创建和替换）和附加）的解决方案： write_pandas snowflake connector function is not able to operate on table

【讨论】：