Pandas to_sql 函数没有将正确的 dtypes 放入 SQLite DB答案

【问题标题】：Pandas to_sql function doesn't put correct dtypes into SQLite DBPandas to_sql 函数没有将正确的 dtypes 放入 SQLite DB
【发布时间】：2020-08-09 07:51:13
【问题描述】：

我有一个问题，pandas to_sql 函数没有将正确的 dtypes 放入 SQLite 3 数据库。它会自动检测类型并忽略提供的字典中指定的类型。我尝试了很多变体或类型，如'int'、'integer'、'float'、'real', 'floating', 试图直接或使用 sqlalchemy.types 方法显示它们。

I attach also screenshot with SQLite DB columns types and types in csv-file used for import to SQLite DB. SQLite DB columns types are always the same, no matter which datatypes I showed.

def generate_dtypes_for_sql(filename, separator, decimal, skip_errors, quoting, engine, shape):
    df2 = pd.DataFrame()
    if os.path.isfile(filename):
        try:
            df = load_csv(filename, separator, decimal, skip_errors, quoting, engine, shape)
            params_to_import = {}
            cols = df.columns
            i_arc = 7; i_name = 6; i_type = 3; i_factor = 5
            params_types = ['boolean', 'integer', 'float', 'text']
            if (i_arc==cols.get_loc('Архивация') and 
               i_name==cols.get_loc('Символьный тэг') and 
               i_type==cols.get_loc('Тип')):
                 for index, row in df.iterrows():
                    if row[i_arc] == 1:
                        if math.isnan(row[i_type]):
                           params_to_import[row[i_name]] = params_types[3]
                        elif row[i_type] in range(6):
                            if row[i_factor] == 1:
                                params_to_import[row[i_name]] = params_types[1]
                            else:
                                params_to_import[row[i_name]] = params_types[2]
                        elif row[i_type] == 6:
                            params_to_import[row[i_name]] = params_types[2]
                        else:
                            params_to_import[row[i_name]] = params_types[3]
            df2 = pd.DataFrame([params_to_import])
            df2.T.to_csv("params_to_import.csv", sep=";", index_label="Name", header=['Type'])
        except LoadCsvError as e:
            click.echo("Could not load {}: {}".format(filename, e), err=True)
    return df2

def sqlcol(dfparam):    
    dtypedict = {}
    for index, values in dfparam.items():
        for value in values:
            if value == "boolean":
                dtypedict.update({index: sqlalchemy.types.Boolean()})
            elif value == "integer":
                dtypedict.update({index: sqlalchemy.types.Integer()})
            elif value == "float":
                dtypedict.update({index: sqlalchemy.types.Float()})
            elif value == "text":
                dtypedict.update({index: sqlalchemy.types.Text()})        
    return dtypedict    

df_for_sql = generate_dtypes_for_sql(types_file, separator, decimal, skip_errors, quoting, engine, shape)
df_dtypes = sqlcol(df_for_sql)

conn = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES)

df.to_sql(df.name, conn, if_exists="append", index=False, dtype=df_dtypes_str)

解决方案： 我不知道为什么，只有当我将 pandas to_sql 函数与标志一起使用时，它才会忽略 dtype：if_exists="append"。但如果我将它与标志 if_exists="replace" 一起使用，它可以正常工作。

【问题讨论】：

提醒一下，SQLite 具有动态类型，您为列指定的类型只是定义它们的亲和性（他们喜欢如何存储数据），但您可以将文本插入整数列等。它还具有它自己的方式将常见的 SQL 类型名称映射到它实际支持的类型名称（没有真正的 DATE 等类型）。请提供minimal reproducible example，强调最小且可重复。提供样本数据、预期产出和实际产出。将代码缩减到产生这些输出所需的绝对最小值。
我可以重现您的问题，但dtype= 修复了works for me。
@gord-thompson，sa.Table 字符串导致此错误：AttributeError: 'sqlite3.Connection' object has no attribute 'run_callable' 我在 Google 中找不到如何解决它。
在我的测试代码中engine 是使用create_engine 方法创建的SQLAlchemy Engine 对象。
Here 是一个更完整的例子。

标签： python pandas sqlite sqlalchemy

【解决方案1】：

这里的问题不是 pandas 忽略了 dtype= 参数，而是 to_sql 被告知 if_exists="append" 并且表已经存在，所以列类型（实际上是“亲和力" 在 SQLite 中）已在数据库中定义。此测试代码表明，如果该表尚不存在，则使用 dtype= 参数确实会产生所需的结果：

import pandas as pd
import sqlalchemy as sa

connection_uri = "sqlite:///C:/__tmp/SQLite/walmart.sqlite"
engine = sa.create_engine(connection_uri)

def drop_table(table_name, engine):
    with engine.connect() as conn:
        conn.execute(sa.text(f'DROP TABLE IF EXISTS "{table_name}"'))

df = pd.read_csv(r"C:\Users\Gord\Desktop\test.csv")
print(df)
"""
   All_HY_SP1  All_HY_SP2
0           1         1.1
1           2         2.2
"""
# default behaviour
drop_table("from_csv", engine)
df.to_sql("from_csv", engine, if_exists="append", index=False)
tbl = sa.Table("from_csv", sa.MetaData(), autoload_with=engine)
print(", ".join([f'"{col.name}": {col.type}' for col in tbl.columns]))
# "All_HY_SP1": BIGINT, "All_HY_SP2": FLOAT
#               ^^^^^^
# fix with dtype:
dtype_dict = {"All_HY_SP1": sa.Float, "All_HY_SP2": sa.Float}
drop_table("from_csv", engine)
df.to_sql("from_csv", engine, if_exists="append", index=False, dtype=dtype_dict)
tbl = sa.Table("from_csv", sa.MetaData(), autoload_with=engine)
print(", ".join([f'"{col.name}": {col.type}' for col in tbl.columns]))
# "All_HY_SP1": FLOAT, "All_HY_SP2": FLOAT
#               ^^^^^

【讨论】：