【问题标题】:I am unable to parse and load a nested JSON file into postgres with Python3我无法使用 Python3 解析嵌套 JSON 文件并将其加载到 postgres
【发布时间】:2020-05-12 19:04:36
【问题描述】:

过去一周我已经尽我所能尝试通过 python 3 将 JSON 文件加载到 Postgres 表中。它是一个嵌套的 JSON,虽然我能够完成一些基本示例,但我无法做到正确编码。我的 ado.json 文件是这样的。最终我只想把这个东西弄平。

我不断收到此错误:

xecute_sql() error: syntax error at or near "{"
LINE 2: VALUES ('System.LinkTypes.Hierarchy', {'linkType': 'System.L...

我对 python 很陌生,只是在动态中学习。看来我调用列表的方式只是读取 2 列(rel 和属性)。我已经阅读了几十个博客,但不幸的是我无法正确地阅读。

[
    {
        "rel":"System.LinkTypes.Hierarchy","attributes":{"linkType":"System.LinkTypes.Hierarchy-Forward","sourceId":13,"targetId":23,"isActive":true,"changedDate":"2019-01-18T18:45:53.013Z","changedBy":{"id":"3209f8e3-95a2-6448-a146-13e374bd03bc","displayName":"Stacey Clark","uniqueName":"Clark_Stacey@gaaf.com","descriptor":"aad.MzIwOWY4ZTMtOTVhMi03NDQ4LWExNDYtMTNlMzc0YmQwM2Jj"},"comment":null,"changedOperation":"create","sourceProjectId":"7dc32e0c-84d4-46a4-aec6-1f0a22b60ef8","targetProjectId":"7dc32e0c-84d4-46a4-aec6-1f0a22b60ef8"}
    },
    {
        "rel":"System.LinkTypes.Hierarchy","attributes":{"linkType":"System.LinkTypes.Hierarchy-Forward","sourceId":9,"targetId":24,"isActive":true,"changedDate":"2019-01-18T18:46:08.64Z","changedBy":{"id":"3209f8e3-95a2-6448-a146-13e374bd03bc","displayName":"Stacey Clark","uniqueName":"Clark_Stacey@gaaf.com","descriptor":"aad.MzIwOWY4ZTMtOTVhMi03NDQ4LWExNDYtMTNlMzc0YmQwM2Jj"},"comment":null,"changedOperation":"create","sourceProjectId":"7dc32e0c-84d4-46a4-aec6-1f0a22b60ef8","targetProjectId":"7dc32e0c-84d4-46a4-aec6-1f0a22b60ef8"}
    },
    {
        "rel":"System.LinkTypes.Hierarchy","attributes":{"linkType":"System.LinkTypes.Hierarchy-Forward","sourceId":9,"targetId":25,"isActive":true,"changedDate":"2019-01-18T18:46:26.64Z","changedBy":{"id":"3209f8e3-95a2-6448-a146-13e374bd03bc","displayName":"Stacey Clark","uniqueName":"Clark_Stacey@gaaf.com","descriptor":"aad.MzIwOWY4ZTMtOTVhMi03NDQ4LWExNDYtMTNlMzc0YmQwM2Jj"},"comment":null,"changedOperation":"create","sourceProjectId":"7dc32e0c-84d4-46a4-aec6-1f0a22b60ef8","targetProjectId":"7dc32e0c-84d4-46a4-aec6-1f0a22b60ef8"}
    }
]

我使用的 Python 脚本如下。

import json, sys  # Import Python's built-in JSON Library
import pandas as pd
import flatten_json
from pandas.io.json import json_normalize
from psycopg2 import connect, Error # import the psycopg2 database adapter for PostgreSQL

# use Python's open() function to load the JSON data
with open('ado.json', encoding='utf-8') as json_data:
    record_list = json.loads(json_data.read())
    print(record_list)

if type(record_list) == list:
    first_record = record_list[0]

    # I am unable to get the correct column names. only rel and attributes show up

    columns = list(first_record.keys())
    print ("\ncolumn names:", columns)

table_name = "json_data_ado"
sql_string = 'INSERT INTO {} '.format( table_name )
sql_string += "(" + ', '.join(columns) + ")\nVALUES "

for i, record_dict in enumerate(record_list):

    values = []
    for col_names, val in record_dict.items():

        # Postgres strings must be enclosed with single quotes
        if type(val) == str:
            val = val.replace("'", "''")
            val = "'" + val + "'"

        values += [ str(val) ]
    # join the list of values and enclose record in parenthesis
    sql_string += "(" + ', '.join(values) + "),\n"

# remove the last comma and end statement with a semicolon
sql_string = sql_string[:-2] + ";"

##insert json data into postgres sql  -- Simply output to screen
print ("\nSQL statement:")
print (sql_string)

# Connect to postgres
try:
    # declare a new PostgreSQL connection object
    conn = connect(
        dbname = "postgres",
        user = "postgres",
        host = "test.us-east-1.rds.amazonaws.com",
        password = "postgres",
        # attempt to connect for 3 seconds then raise exception
        connect_timeout = 10
    )

    cur = conn.cursor()
    print ("\ncreated cursor object:", cur)

except (Exception, Error) as err:
    print ("\npsycopg2 connect error:", err)
    conn = None
    cur = None

if cur != None:

    try:
        cur.execute( sql_string )
        conn.commit()

        print ('\nfinished INSERT INTO execution')

    except (Exception, Error) as error:
        print("\nexecute_sql() error:", error)
        conn.rollback()

    cur.close()
    conn.close()

【问题讨论】:

  • Use bind parameters,而不是字符串连接,将值放入 SQL 查询中。然后所有的转换代码都消失了。

标签: json python-3.x postgresql nested


【解决方案1】:

不要使用字符串连接向 SQL 查询添加值。正如您所发现的,正确转义值很复杂,并且可能导致错误和security holes

改为use bind parameters to pass your values into the SQL query like you would a function。无需转义。

cur.execute("INSERT INTO test (num, data) VALUES (%s, %s)", (100, "abc'def"))

您确实需要设置占位符,类似于设置列的方式。

# insert into json_data_ado (foo, bar) values (%s, %s)
sql_string = 'insert into json_data_ado'
sql_string += "(" + ', '.join(columns) + ")\n"
sql_string += "values (" + ", ".join(["%s" for x in columns]) + ")\n"

然后简单地将值列表传递给执行。它们将插入%s 占位符。

cur.execute(sql_string, values)

【讨论】:

  • 感谢您分享此内容。我修改了脚本如下:
  • 我无法关注这个。正如您所提到的,我已经尝试了两种执行光标的方法。我收到此错误execute_sql() error: syntax error at or near "(" LINE 3: ('System.LinkTypes.Hierarchy', {'linkType': 'System.LinkType...
  • @piffer 仅从代码的 sn-ps 很难判断可能有什么问题,但请注意错误位于 (,这意味着查询有问题。请注意{'linkType': ...,这意味着您正在尝试传入字符串化的 Python 字典,而不是 JSON。这是因为您在每个字典中插入值,但 attributes 的值是另一个字典。您必须将原来的复杂结构分解成一个简单的字符串和数字列表。
【解决方案2】:

感谢您的反馈。我根据您的反馈修改了以下两个部分。但是现在得到一个关于语法 %s 的错误。

我生成的 SQL 语句如下所示。但是我的 Json 中有更多列,所以我不确定为什么列名中只显示 2 列“rel and attributes”。我尝试像这样解析列。

with open('ado.json', encoding='utf-8') as json_data:
    record_list = json.loads(json_data.read())
    print(record_list)

if type(record_list) == list:
    first_record = record_list[0]

    columns = list(first_record.keys())
    print ("\ncolumn names:", columns)
column names: ['rel', 'attributes']

SQL statement:
insert into json_data_ado(rel, attributes)
values (%s, %s)
('System.LinkTypes.Hierarchy', {'linkType': 'System.LinkTypes.Hierarchy-Forward', 'sourceId': 13, 'targetId': 23, 'isActive': True, 'changedDate': '2019-01-18T18:45:53.013Z', 'changedBy': {'id': '3209f8e3-95a2-6448-a146-13e374bd03bc', 'displayName': 'Stacey Clark', 'uniqueName': 'Clark_Stacey@test.com', 'descriptor': 'aad.MzIwOWY4ZTMtOTVhMi03NDQ4LWExNDYtMTNlMzc0YmQwM2Jj'}, 'comment': None, 'changedOperation': 'create', 'sourceProjectId': '7dc32e0c-84d4-46a4-aec6-1f0a22b60ef8', 'targetProjectId': '7dc32e0c-84d4-46a4-aec6-1f0a22b60ef8'}),
('System.LinkTypes.Hierarchy', {'linkType': 'System.LinkTypes.Hierarchy-Forward', 'sourceId': 9, 'targetId': 24, 'isActive': True, 'changedDate': '2019-01-18T18:46:08.64Z', 'changedBy': {'id': '3209f8e3-95a2-6448-a146-13e374bd03bc', 'displayName': 'Stacey Clark', 'uniqueName': 'Clark_Stacey@test.com', 'descriptor': 'aad.MzIwOWY4ZTMtOTVhMi03NDQ4LWExNDYtMTNlMzc0YmQwM2Jj'}, 'comment': None, 'changedOperation': 'create', 'sourceProjectId': '7dc32e0c-84d4-46a4-aec6-1f0a22b60ef8', 'targetProjectId': '7dc32e0c-84d4-46a4-aec6-1f0a22b60ef8'}),

try:
        cur.execute("""Insert into json_data_ado (rel, attributes ,linktype , source1d , targetid , isactive , changeddate , changedby ,id ,displayname ,uniquename ,descriptor ,comment , changedoperation ,sourceprojectid , targetprojectid  )
        values (%s, %s, %s, %s,  %s, %s, %s, %s,  %s, %s, %s, %s,  %s, %s, %s, %s )""")
        conn.commit()

# insert into json_data_ado (foo, bar) values (%s, %s)
sql_string = 'insert into json_data_ado'
sql_string += "(" + ', '.join(columns) + ")\n"
sql_string += "values (" + ", ".join(["%s" for x in columns]) + ")\n"

我收到有关语法的错误

execute_sql() error: syntax error at or near "%"
LINE 2:         values (%s, %s, %s, %s,  %s, %s, %s, %s,  %s, %s, %s...

【讨论】:

  • 您必须将值传递给execute,其中填写所有%s
猜你喜欢
  • 2019-09-22
  • 2013-12-23
  • 1970-01-01
  • 2016-06-19
  • 1970-01-01
  • 1970-01-01
  • 2019-07-21
  • 2019-06-30
  • 1970-01-01
相关资源
最近更新 更多