这里的函数接受df、表的schemaname、表的名称、冲突名称中要用作冲突的列,以及sqlalchemy的create_engine创建的引擎。它根据冲突列更新表。这是@Ionut Ticus 解决方案的扩展解决方案。
不要一起使用 pandas.to_sql() 。 pandas.to_sql() 破坏主键设置。在这种情况下,需要通过 ALTER 查询设置主键,这是下面函数的建议。大熊猫不一定会破坏主键,可能还没有设置它。在这种情况下会出现错误:
引用表的给定键没有唯一约束匹配?函数会建议你在下面执行。
engine.execute('ALTER TABLE {schemaname}.{tablename} ADD PRIMARY KEY ({conflictcolumn});
功能:
def update_query(df,schemaname,tablename,conflictcolumn,engine ):
"""
This function takes dataframe as df, name of schema as schemaname,name of the table to append/add/insert as tablename,
and column name that only other elements of rows will be changed if it's existed as conflictname,
database engine as engine.
Example to engine : engine_portfolio_pg = create_engine('postgresql://pythonuser:vmqJRZ#dPW24d@145.239.121.143/cetrm_portfolio')
Example to schemaname,tablename : weatherofcities.sanfrancisco , schemaname = weatherofcities, tablename = sanfrancisco.
"""
excluded = ""
columns = df.columns.tolist()
deleteprimary = columns.copy()
deleteprimary.remove(conflictcolumn)
excluded = ""
replacestring = '%s,'*len(df.columns.tolist())
replacestring = replacestring[:-1]
for column in deleteprimary:
excluded += "EXCLUDED.{}".format(column)+","
excluded = excluded[:-1]
columns = ','.join(columns)
deleteprimary = ','.join(deleteprimary)
insert_sql = """ INSERT INTO {schemaname}.{tablename} ({allcolumns})
VALUES ({replacestring})
ON CONFLICT ({conflictcolumn}) DO UPDATE SET
({deleteprimary}) = ({excluded})""".format( tablename = tablename, schemaname=schemaname,allcolumns = columns, replacestring= replacestring,
conflictcolumn= conflictcolumn,deleteprimary = deleteprimary, excluded=excluded )
conn = engine.raw_connection()
conn.autocommit = True
#conn = engine.connect()
cursor = conn.cursor()
i = 0
print("------------------------"*5)
print("If below error happens:")
print("there is no unique constraint matching given keys for referenced table?")
print("Primary key is not set,you can execute:")
print("engine.execute('ALTER TABLE {}.{} ADD PRIMARY KEY ({});')".format(schemaname,tablename,conflictcolumn))
print("------------------------"*5)
for index, row in df.iterrows():
cursor.execute(insert_sql, tuple(row.values))
conn.commit()
if i == 0:
print("Order of Columns in Operated SQL Query for Rows")
columns = df.columns.tolist()
print(insert_sql%tuple(columns))
print("----")
print("Example of Operated SQL Query for Rows")
print(insert_sql%tuple(row.values))
print("---")
i += 1
conn.close()