python 正则表达式从多行花括号中查找字符串答案

【问题标题】：python regex to find string from multi line curley bracespython 正则表达式从多行花括号中查找字符串
【发布时间】：2020-06-19 01:59:24
【问题描述】：

我有一个这样的字符串。如何创建一个字典，其中 First-tags 作为键，之后的所有内容：作为值？

test_string = """###Some Comment 
First-tags : 
{
  "tag1": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  },
  "tag2": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  }
  so on .....
} 
"""

示例：键将是 First-tags 和价值将是

{
  "tag1": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  },
  "tag2": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  }
  so on .....
}

[编辑：字符串数据在文件中。问题是从文件中读取并创建一个字典，其中键是注释，值是 Json 数据]

例如，文件将有：

###Some Comment 
    First-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 


###2nd Comment 
    Second-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 

###Some other Comment 
    someother-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    }

【问题讨论】：

您是在问如何将包含 cmets 的 json 字符串解析到字典中？这似乎比这更复杂一些；字符串是否总是包含First-tags？
您是在问如何将包含 cmets 的 json 字符串解析到字典中？是的，但文件结构将是这样的： cmets JSON data 另一个注释 JSON data 等等... First-tags 只是一个示例。它可以是任何东西。

标签： python json python-3.x dictionary

【解决方案1】：

您可以使用此正则表达式，它将匹配: 之前的最后一组单词字符（包括-）到第1 组，然后匹配到下一条评论（###）或结尾的所有其他内容串入第 2 组：

([\w-]+)\s*:\s*(.*?)(?=\s*###|$)

然后，您可以通过为字符串中的每个匹配项遍历两个组来制作字典：

import re

test_string = """
###Some Comment 
    First-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 


###2nd Comment 
    Second-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 

###Some other Comment 
    someother-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    }
"""
res = {}
for match in re.finditer(r'([\w-]+)\s*:\s*(.*?)(?=\s*###|$)', test_string, re.S):
    res[match.group(1)] = match.group(2)

print(res)

输出：

{
 'First-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }',
 'Second-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }',
 'someother-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
}

更新

如果您还想获取 cmets，可以使用以下代码：

res = {}
for match in re.finditer(r'###([^\n]+)\s*([\w-]+)\s*:\s*(.*?)(?=\s*###|$)', test_string, re.S):
    res[match.group(1)] = { match.group(2) : match.group(3) }

print(res)

输出：

{
 'Some Comment ': {
   'First-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
 },
'2nd Comment ': {
   'Second-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
 },
 'Some other Comment ': {
  'someother-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
 }
}

【讨论】：

谢谢。如果我有一个多次出现此类事件的文件，它会起作用吗？我对问题陈述进行了编辑。
@HarshKumar 请看我的编辑。它使用re.finditer 查找字符串中的所有匹配项。
是否也可以得到评论？即打印 ###Some Comment 以及
@HarshKumar 你一直在移动目标帖子......你希望如何返回评论？
很抱歉。我应该更清楚。像这样： res[comment][match.group(1)] = match.group(2)

【解决方案2】：

所以我在这里尝试将字符串转换为 JSON

但是为了让它工作，我的 str 应该只是 JSON 而不是别的

所以我找到了第一个 { 并从那里取出字符串

import json

my_str = '''
First-tags : 
{
  "tag1": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  },
  "tag2": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  }
  }
  '''
# find the first {
i = my_str.index('{')
my_str = my_str[i:] # trim the string so that only dict is left
my_dict = dict(json.loads(my_str)) # create JSON and then convert that to dict
print(my_dict) # n'joy

如果您愿意，您还可以查找 JSON 的结尾并修剪 str（查找 }）

根据您问题中的更新更新解决方案

import json

my_str = '''
###Some Comment 
    First-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
    } 


###2nd Comment 
    Second-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
    } 

###Some other Comment 
    someother-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
    } 
'''
data = []
bal = 0
start = end = 0
for i,v in enumerate(my_str):
    if v == '{': 
        if bal == 0:
            start = i
        bal+=1
    elif v=='}': 
        bal-=1
        end = i
    if start!=end and bal ==0: # just looking for data in {....}
        new_str = my_str[start:end+1]
        print(new_str)
        my_dict = dict(json.loads(new_str))
        data .append(my_dict)
        start = end = i+1
print(data) # n'joy

[{'tag1': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}, 'tag2': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}}, {'tag1': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}, 'tag2': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}}, {'tag1': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}, 'tag2': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}}]

【讨论】：

谢谢。此解决方案适用于我之前的问题，但我对问题陈述进行了编辑。感谢您的帮助。
您好，Kuldeep，感谢您的帮助。我已经接受了使用正则表达式的其他解决方案。您的方法也很好，但对于我的用例，我发现正则表达式会更好。感谢您的帮助。
绝对没问题，很高兴您的问题得到解决！