【问题标题】:Convert Json data into SQL table using Python使用 Python 将 Json 数据转换为 SQL 表
【发布时间】:2020-02-14 05:00:09
【问题描述】:

我的 json 数据如下所示:

[
    {
        "fields": {
            "bkdate": null,
            "business_credit_card_total_balances": null,
            "business_credit_card_total_limits": null,
            "business_total_monthly_debt_payments": null,
            "business_total_mortgage_monthly_payments": null,
            "created_at": "2016-08-04T00:29:03.067Z",
            "detail_results": null,
            "error_reason": "no reasons",
            "fico_v2": "695",
            "fico_v3": null,
            "loanapp_id": 194,
            "personal_credit_card_total_balances": null,
            "personal_credit_card_total_limits": null,
            "personal_total_monthly_payments": null,
            "report_type": "CreditProfile",
            "result": true,
            "role": "applicant",
            "total_mortgage_monthly_payments": null,
            "username": "cho",
            "version": "CF Microloan",
            "xml_data": "<?xml version=\"1.0\" standalone=\"no\"?><NetConnectResponse xmlns=\"http://www.experian.com/NetConnectResponse\"><CompletionCode>0000</CompletionCode>"
        },
        "model": "common.prequalresult",
        "pk": 1
    }
]

我想把它转换成 SQL 表。我试过这个:

v = pd.DataFrame(data['fields'])
t = pd.io.json.json_normalize(data['fields'], ['model'], ['pk'], meta_prefix='parent_')

v.to_sql('fields', engine)
t.to_sql('fields', engine)

但它不起作用。有人可以工作并输出 SQL 表吗?

我的错误如下所示:

TypeError                                 Traceback (most recent call last)
<ipython-input-86-a186308b321b> in <module>()
      1 
----> 2 v = pd.DataFrame(data['fields'])
      3 t = pd.io.json.json_normalize(data['fields'], ['model'], ['pk'], meta_prefix='parent_')
      4 
      5 v.to_sql('fields', engine)

TypeError: list indices must be integers or slices, not str

我想创建两个表。一个包括“fields”、“model”和“pk”。另一个表包括“fields”中的所有值。

架构如下所示: enter image description here

【问题讨论】:

  • 显示什么错误?
  • 请添加更多关于您的问题的具体细节(例如,什么不起作用?错误日志是什么...)
  • 我想创建两个表。一个包括“字段”、“模型”和“pk”。另一个表包括“字段”中的所有值。错误是TypeError Traceback (most recent call last) &lt;ipython-input-86-a186308b321b&gt; in &lt;module&gt;() 1 ----&gt; 2 v = pd.DataFrame(data['fields']) 3 t = pd.io.json.json_normalize(data['fields'], ['model'], ['pk'], meta_prefix='parent_') 4 5 v.to_sql('fields', engine) TypeError: list indices must be integers or slices, not str
  • data['fields'] 字符串类型吗?如果是,v = pd.DataFrame(data['fields']) 会出错
  • 问题没有得到正确解释......您要插入表格中的字段是什么?

标签: python sql json


【解决方案1】:

考虑您想要存储在 Mysql DB 中的数据。您可以使用一对一关系存储在数据库中。可以使用SQLAlchemy ORM 正确处理它,而不是使用 pandas DataFrame。以下代码为解决此问题提供了更好的理解。现在我只考虑了fields key 中的几个键。

from sqlalchemy import Column, Integer, String, Text, DateTime, Float, Boolean, ForeignKey
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy.ext.declarative import declarative_base

from sqlalchemy import create_engine

total_data = [
    {
        "fields": {
            "bkdate": None,
            "business_credit_card_total_balances": None,
            "created_at": "2016-08-04T00:29:03.067Z",
        },
        "model": "common.prequalresult",
        "pk": 1
    },
    {
        "fields": {
            "bkdate": "2016-08-04T00:29:03.067Z",
            "business_credit_card_total_balances": 23,
            "created_at": "2016-08-04T00:29:03.067Z",
        },
        "model": "common.prequalresult",
        "pk": 2
    },
    {
        "fields": {
            "bkdate": "asdfas",
            "business_credit_card_total_balances": 1111,
            "created_at": "2016-08-04T00:29:03.067Z",
        },
        "model": "common.prequalresult",
        "pk": 3
    }
]

engine = create_engine('mysql://user:password@localhost:5432/my_data', echo=False)

Base = declarative_base()


class Article(Base):
    __tablename__ = 'article'
    id = Column(Integer, primary_key=True, autoincrement=True)
    pk = Column(Integer, primary_key=False)
    model = Column(String(100), nullable=True)
    child = relationship('Comment', backref='article', uselist=False)


class Comment(Base):
    __tablename__ = 'comment'
    id = Column(Integer, primary_key=True, autoincrement=True)
    bkdate = Column(String(100), nullable=True)
    business_credit_card_total_balances = Column(Integer, nullable=True)
    created_at = Column(String(100), nullable=True)
    article_id = Column(Integer, ForeignKey('article.id'))


x = Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine)
session = Session()

for temp_data in total_data:
    pk = temp_data['pk']
    model = temp_data['model']
    bkdate = temp_data['fields']['bkdate']
    business_credit_card_total_balances = temp_data['fields']['business_credit_card_total_balances']
    created_at = temp_data['fields']['created_at']
    parent1 = Article(pk=temp_data['pk'], model=model)
    child = Comment(bkdate=bkdate,
                    business_credit_card_total_balances=business_credit_card_total_balances,
                    created_at=created_at,
                    article=parent1)

    session.add(parent1)
    session.add(child)

session.commit()

由于每个 dict 中给出的字段 pk 可以重复,所以我创建了 id 作为主键。

输出:

mysql> select * from article;
+----+------+----------------------+
| id | pk   | model                |
+----+------+----------------------+
|  1 |    1 | common.prequalresult |
|  2 |    2 | common.prequalresult |
|  3 |    3 | common.prequalresult |
+----+------+----------------------+

mysql> select * from comment;
+----+--------------------------+-------------------------------------+--------------------------+------------+
| id | bkdate                   | business_credit_card_total_balances | created_at               | article_id |
+----+--------------------------+-------------------------------------+--------------------------+------------+
|  1 | NULL                     |                                NULL | 2016-08-04T00:29:03.067Z |          1 |
|  2 | 2016-08-04T00:29:03.067Z |                                  23 | 2016-08-04T00:29:03.067Z |          2 |
|  3 | asdfas                   |                                1111 | 2016-08-04T00:29:03.067Z |          3 |
+----+--------------------------+-------------------------------------+--------------------------+------------+

【讨论】:

  • 我运行代码。但它显示了错误:ModuleNotFoundError: No module named 'MySQLdb'。我试过:pip install pymysql,我正在使用 Google Colab。
  • 使用 pip 安装 pymysql。还要在 python 中安装 sqlalchemy
猜你喜欢
  • 2017-04-01
  • 2019-09-26
  • 1970-01-01
  • 2022-10-04
  • 2021-01-01
  • 1970-01-01
  • 2022-01-18
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多