【问题标题】:Get INSERT INTO statement from SQL statements string in Python从 Python 中的 SQL 语句字符串获取 INSERT INTO 语句
【发布时间】:2021-12-27 20:12:40
【问题描述】:

我有如下字符串:

sql = """DROP TABLE IF EXISTS table1;

ALTER TABLE table1 DROP PRIMARY KEY;

INSERT INTO table1 (id, created, name, telefonnummer, erPatient_id) VALUES
    (1, '2015-08-06 12;09:08', ' ', ' ', 16528),
    (2, '2015-08-06 12:43:11', ' ', ' ', 16529)
;

INSERT INTO table2 (comment, id) VALUES
('hello this is a semicolon ;', 2);"""

我想得到语句 INSERT INTO table1:

INSERT INTO table1 (id, created, name, telefonnummer, erPatient_id) VALUES
        (1, '2015-08-06 12;09:08', ' ', ' ', 16528),
        (2, '2015-08-06 12:43:11', ' ', ' ', 16529)
    ;

我无法用sql.split(';) 拆分字符串,因为要插入的VALUES 中有分号。

我尝试了正则表达式但没有成功:

import re
pattern_string = r"INSERT INTO table1[(]*[^)]+\)[^)]"
q = re.findall(pattern_string, data, re.MULTILINE | re.DOTALL)

在真正的字符串中,将插入数千个值和数十个表。

【问题讨论】:

  • 如果您的数据不规则,那么正则表达式是错误的工具。你需要一个解析引擎。这个问题并不新鲜。 CSV 和无数其他东西也存在同样的问题。

标签: python sql regex


【解决方案1】:

使用

import re
pattern_string = r"\bINSERT INTO \w+\s.*?;\s*$"
q = re.findall(pattern_string, data, re.MULTILINE | re.DOTALL)

regex proof

解释

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  INSERT INTO              'INSERT INTO '
--------------------------------------------------------------------------------
  \w+                      word characters 1 or more times 
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  .*?                      any character (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  ;                        ';'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  $                        end of a line

【讨论】:

    【解决方案2】:

    你可以使用我的图书馆SQLGlot

    import sqlglot
    import sqlglot.expressions as exp
    
    sql = ...
    
    for expression in sqlglot.parse(sql):
        if isinstance(expression, exp.Insert):
            print(expression)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-10-18
      • 1970-01-01
      • 1970-01-01
      • 2011-12-04
      • 1970-01-01
      • 2020-07-09
      • 2019-03-04
      • 1970-01-01
      相关资源
      最近更新 更多