【问题标题】:How can i convert text to json file? [closed]如何将文本转换为 json 文件? [关闭]
【发布时间】:2021-10-01 06:49:53
【问题描述】:

我需要用这种结构创建一个 JSON 文件

[{"image_id": 0873, "caption": "clock tower with a clock on top of it"}, {"image_id": 1083, "caption": "two zebras are standing in the grass in the grass"} , .....

来自该文件,其中包含

image_id 0873  caption clock tower with a clock on top of it 
image_id 1083  caption two zebras are standing in the grass in the grass 
image_id 1270  caption baseball player is swinging a bat at the ball  
image_id 1436  caption man is sitting on the bed with laptop 

我该如何开始呢?

【问题讨论】:

    标签: python json file


    【解决方案1】:

    假设每一行看起来像: image_id {image_id} 标题 {caption} 您可以使用 str 方法split(maxsplit=number) 将行拆分为四个部分。

    line = "image_id 0873  caption clock tower with a clock on top of it"
    _, image_id, _, caption = line.split(maxsplit=3)
    # Now image_id = "0873", caption = "caption clock tower with a clock on top of it"
    

    用于遍历文件的所有行:

    images = []
    with open(path) as f:
        for line in f:
            _, image_id, _, caption = line.split(maxsplit=3)
            images.append({"image_id": int(image_id), "caption": caption})
    

    要将变量保存到 JSON 文件中,可以使用 json 模块:

    import json
    with open(path_to_save, "w") as f:
        json.dump(images, f)
    

    【讨论】:

      【解决方案2】:

      尝试使用正则表达式 - 轻松导入更复杂的模式。以下是@Kozubi 答案的扩展版本:

          import json
          import re
          
          json_data = []
          with open("test.txt") as f:
              pattern = re.compile(r"""image_id\s+(?P<image_id>[0-9]+)\s+
                                       caption\s+(?P<caption>.*)$
                                       """, re.X)
              for line in f.readlines():
                  m = pattern.match(line.strip())
                  if m:
                      json_data.append({
                          "image_id": int(m.group('image_id')),
                          "caption": m.group('caption')
                          })
                      
              print(json.dumps(json_data, indent=4))            
              json.dump(json_data, open("json_dump.json", 'w'), indent=4)
      

      【讨论】:

        【解决方案3】:

        这应该是诀窍:

        import json
        
        # get your data
        file_lines = open("file_with_data.txt").readlines()
        json_data = []
        for line in file_lines:
            # removing new line char \n
            line = line.replace("\n", "")
            # split words inside line
            splt_line = line.split(" ")
            # bullit single dict from line data
            small_json = {splt_line[0]: splt_line[1], splt_line[3]: " ".join(splt_line[4:]).strip()}
            # add data to your list
            json_data.append(small_json)
        # now dump List[Dict] to  .json file
        json.dump( json_data, open("json_dump.json", 'w'),)
        

        【讨论】:

        • 非常感谢,我尝试了代码,但得到的不是“标题”字..句子中的第一个字像{"image_id": "42", "man": "man is standing in front of the luggage"}, ..
        • @Lei 是的,我的错 - 现在修复了
        猜你喜欢
        • 2016-07-07
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-01-06
        • 2016-04-12
        • 1970-01-01
        • 2021-03-25
        • 2013-04-13
        相关资源
        最近更新 更多