【问题标题】:python regex to find string from multi line curley bracespython 正则表达式从多行花括号中查找字符串
【发布时间】:2020-06-19 01:59:24
【问题描述】:

我有一个这样的字符串。如何创建一个字典,其中 First-tags 作为键,之后的所有内容:作为值?

test_string = """###Some Comment 
First-tags : 
{
  "tag1": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  },
  "tag2": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  }
  so on .....
} 
"""

示例: 键将是 First-tags 和价值将是

{
  "tag1": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  },
  "tag2": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  }
  so on .....
} 

[编辑:字符串数据在文件中。问题是从文件中读取并创建一个字典,其中键是注释,值是 Json 数据]

例如,文件将有:

###Some Comment 
    First-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 


###2nd Comment 
    Second-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 

###Some other Comment 
    someother-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 

【问题讨论】:

  • 您是在问如何将包含 cmets 的 json 字符串解析到字典中?这似乎比这更复杂一些;字符串是否总是包含First-tags
  • 您是在问如何将包含 cmets 的 json 字符串解析到字典中?是的,但文件结构将是这样的: cmets JSON data 另一个注释 JSON data 等等... First-tags 只是一个示例。它可以是任何东西。

标签: python json python-3.x dictionary


【解决方案1】:

您可以使用此正则表达式,它将匹配: 之前的最后一组单词字符(包括-)到第1 组,然后匹配到下一条评论(###)或结尾的所有其他内容串入第 2 组:

([\w-]+)\s*:\s*(.*?)(?=\s*###|$)

然后,您可以通过为字符串中的每个匹配项遍历两个组来制作字典:

import re

test_string = """
###Some Comment 
    First-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 


###2nd Comment 
    Second-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    } 

###Some other Comment 
    someother-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
      so on .....
    }
"""
res = {}
for match in re.finditer(r'([\w-]+)\s*:\s*(.*?)(?=\s*###|$)', test_string, re.S):
    res[match.group(1)] = match.group(2)

print(res)

输出:

{
 'First-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }',
 'Second-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }',
 'someother-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
}

更新

如果您还想获取 cmets,可以使用以下代码:

res = {}
for match in re.finditer(r'###([^\n]+)\s*([\w-]+)\s*:\s*(.*?)(?=\s*###|$)', test_string, re.S):
    res[match.group(1)] = { match.group(2) : match.group(3) }

print(res)

输出:

{
 'Some Comment ': {
   'First-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
 },
'2nd Comment ': {
   'Second-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
 },
 'Some other Comment ': {
  'someother-tags': '{\n      "tag1": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      },\n      "tag2": {\n        "tagKey1": "tagValue1",\n        "tagKey2": "tagValue2"\n      }\n      so on .....\n    }'
 }
}

【讨论】:

  • 谢谢。如果我有一个多次出现此类事件的文件,它会起作用吗?我对问题陈述进行了编辑。
  • @HarshKumar 请看我的编辑。它使用re.finditer 查找字符串中的所有匹配项。
  • 是否也可以得到评论?即打印 ###Some Comment 以及
  • @HarshKumar 你一直在移动目标帖子......你希望如何返回评论?
  • 很抱歉。我应该更清楚。像这样: res[comment][match.group(1)] = match.group(2)
【解决方案2】:

所以我在这里尝试将字符串转换为 JSON

但是为了让它工作,我的 str 应该只是 JSON 而不是别的

所以我找到了第一个 { 并从那里取出字符串

import json

my_str = '''
First-tags : 
{
  "tag1": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  },
  "tag2": {
    "tagKey1": "tagValue1",
    "tagKey2": "tagValue2"
  }
  }
  '''
# find the first {
i = my_str.index('{')
my_str = my_str[i:] # trim the string so that only dict is left
my_dict = dict(json.loads(my_str)) # create JSON and then convert that to dict
print(my_dict) # n'joy

如果您愿意,您还可以查找 JSON 的结尾并修剪 str(查找 }

根据您问题中的更新更新解决方案

import json

my_str = '''
###Some Comment 
    First-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
    } 


###2nd Comment 
    Second-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
    } 

###Some other Comment 
    someother-tags : 
    {
      "tag1": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      },
      "tag2": {
        "tagKey1": "tagValue1",
        "tagKey2": "tagValue2"
      }
    } 
'''
data = []
bal = 0
start = end = 0
for i,v in enumerate(my_str):
    if v == '{': 
        if bal == 0:
            start = i
        bal+=1
    elif v=='}': 
        bal-=1
        end = i
    if start!=end and bal ==0: # just looking for data in {....}
        new_str = my_str[start:end+1]
        print(new_str)
        my_dict = dict(json.loads(new_str))
        data .append(my_dict)
        start = end = i+1
print(data) # n'joy
[{'tag1': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}, 'tag2': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}}, {'tag1': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}, 'tag2': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}}, {'tag1': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}, 'tag2': {'tagKey1': 'tagValue1', 'tagKey2': 'tagValue2'}}]

【讨论】:

  • 谢谢。此解决方案适用于我之前的问题,但我对问题陈述进行了编辑。感谢您的帮助。
  • 您好,Kuldeep,感谢您的帮助。我已经接受了使用正则表达式的其他解决方案。您的方法也很好,但对于我的用例,我发现正则表达式会更好。感谢您的帮助。
  • 绝对没问题,很高兴您的问题得到解决!
猜你喜欢
  • 1970-01-01
  • 2015-08-26
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-01-20
  • 1970-01-01
  • 1970-01-01
  • 2010-09-29
相关资源
最近更新 更多