【问题标题】:Need a script that extracts from a yaml file content and output as a csv file需要一个从 yaml 文件中提取内容并输出为 csv 文件的脚本
【发布时间】:2018-03-22 15:08:42
【问题描述】:

我对 python 很陌生,但我希望你能帮助我创建一个简单的脚本,该脚本读取一堆 .yaml 文件(同一目录中大约 300 个文件)并提取某个部分(仅限选修课) ) 来自 .yaml 文件并将其转换为 csv。

.yaml 文件中的内容示例

code: 9313
degrees:
- name: Design
  coreCourses:
  - ABCD1
  - ABCD2
  - ABCD3
  electiveGroups: #this is the section i need to extract
    - label: Electives
      options:
        - Studio1
        - Studio2
        - Studio3
    - label: OtherElectives
      options:
        - Class1
        - Development2
        - lateclass1
   specialisations:
    - label: Honours

我希望如何查看 csv 中的输出:

.yaml file name | Electives   | Studio1
.yaml file name | Electives   | Studio2
.yaml file name | Electives   | Studio3
.yaml file name | OtherElectives   | class1
.yaml file name | OtherElectives   | Development2
.yaml file name | OtherElectives   | lateclass1

我假设这将是一个相对简单的脚本编写 - 但我正在寻找一些帮助来编写它。我对此很陌生,所以请耐心等待。我已经写了一些 vba 宏,所以我希望我能相对较快地赶上。

最好是一个完整的解决方案,并提供一些关于代码如何工作的指导。

提前感谢您的所有帮助。我希望我的问题很清楚

这是我的第一次尝试(虽然花了不长的时间):

import yaml
with open ('program_4803','r') as f:
    doc = yaml.load(f)
    txt=doc["electiveGroups"]["options"]
    file = open(“test.txt”,”w”) 
        file.write(“txt”) 
        file.close()

正如您可能知道的那样,目前这还很不完整 - 但我正在尽我最大的努力!

【问题讨论】:

    标签: python


    【解决方案1】:

    要解析 yaml 文件,请使用 python yaml 库

    此处示例:Parsing a YAML file in Python, and accessing the data?

    要写入文件,不需要 csv 库

    file = open(“testfile.txt”,”w”) 
    file.write(“Hello World”) 
    file.close() 
    

    上面的代码会写入一个文件,你可以只迭代yaml解析的结果,并将输出相应地写入文件。

    【讨论】:

    • 谢谢。我对此进行了第一次尝试,但效果不佳 - 将继续尝试!
    【解决方案2】:

    这可能会有所帮助:

    import yaml
    import csv
    
    yaml_file_names = ['data.yaml', 'data2.yaml']
    
    
    rows_to_write = []
    
    for idx, each_yaml_file in enumerate(yaml_file_names):
        print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
        with open(each_yaml_file) as f:
            data = yaml.load(f)
    
            for each_dict in data['degrees']:
                for each_nested_dict in each_dict['electiveGroups']:
                    for each_option in each_nested_dict['options']:
                        # write to csv yaml_file_name, each_nested_dict['label'], each_option
                        rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
    
    
    
    with open('output_csv_file.csv', 'w') as out:
        csv_writer = csv.writer(out, delimiter='|')
        csv_writer.writerows(rows_to_write)
        print("Output file output_csv_file.csv created")
    

    用两个模拟输入 yaml 的 data.yamldata2.yaml 测试了这段代码,它们的内容是:

    data.yaml:

    code: 9313
    degrees:
    - name: Design
      coreCourses:
      - ABCD1
      - ABCD2
      - ABCD3
      electiveGroups: #this is the section i need to extract
        - label: Electives
          options:
            - Studio1
            - Studio2
            - Studio3
        - label: OtherElectives
          options:
            - Class1
            - Development2
            - lateclass1
      specialisations:
      - label: Honours
    

    data2.yaml:

    code: 9313
    degrees:
    - name: Design
      coreCourses:
      - ABCD1
      - ABCD2
      - ABCD3
      electiveGroups: #this is the section i need to extract
        - label: Electives
          options:
            - Studio1
        - label: E2
          options:
            - Class1
      specialisations:
      - label: Honours
    

    生成的输出 csv 文件是这样的:

    data.yaml|Electives|Studio1
    data.yaml|Electives|Studio2
    data.yaml|Electives|Studio3
    data.yaml|OtherElectives|Class1
    data.yaml|OtherElectives|Development2
    data.yaml|OtherElectives|lateclass1
    data2.yaml|Electives|Studio1
    data2.yaml|E2|Class1
    

    顺便说一句,你在问题中提供的 yaml 输入,最后两行没有正确缩进

    正如您所说,您需要解析一个目录中的 300 个 yaml 文件,那么您可以使用 python 的glob 模块,如下所示:

    import yaml
    import csv
    import glob
    
    
    yaml_file_names = glob.glob('./*.yaml')
    # yaml_file_names = ['data.yaml', 'data2.yaml']
    
    rows_to_write = []
    
    for idx, each_yaml_file in enumerate(yaml_file_names):
        print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
        with open(each_yaml_file) as f:
            data = yaml.load(f)
    
            for each_dict in data['degrees']:
                for each_nested_dict in each_dict['electiveGroups']:
                    for each_option in each_nested_dict['options']:
                        # write to csv yaml_file_name, each_nested_dict['label'], each_option
                        rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
    
    
    
    with open('output_csv_file.csv', 'w') as out:
        csv_writer = csv.writer(out, delimiter='|', quotechar=' ')
        csv_writer.writerows(rows_to_write)
        print("Output file output_csv_file.csv created")
    

    编辑:正如您在 cmets 中要求跳过那些没有 electiveGroup 部分的 yaml 文件,这是更新后的程序:

    import yaml
    import csv
    import glob
    
    
    yaml_file_names = glob.glob('./*.yaml')
    # yaml_file_names = ['data.yaml', 'data2.yaml']
    
    rows_to_write = []
    
    for idx, each_yaml_file in enumerate(yaml_file_names):
        print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
        with open(each_yaml_file) as f:
            data = yaml.load(f)
    
            for each_dict in data['degrees']:
                try:
                    for each_nested_dict in each_dict['electiveGroups']:
                        for each_option in each_nested_dict['options']:
                            # write to csv yaml_file_name, each_nested_dict['label'], each_option
                            rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
                except KeyError:
                    print("No electiveGroups or options key found in", each_yaml_file)
    
    
    with open('output_csv_file.csv', 'w') as out:
        csv_writer = csv.writer(out, delimiter='|', quotechar=' ')
        csv_writer.writerows(rows_to_write)
        print("Output file output_csv_file.csv created")
    

    【讨论】:

    • 哇,这太棒了!发现它很容易导航并从中学习。太感谢了!反正有没有奖励你这个答案?
    • 有没有办法跳过那些没有选项或选修组的 yaml 文件。我检查了网络,并收到以下建议添加:除了:通过这合适吗?
    • 我尝试在 for 循环中添加“try: [code] except: exception pass” - 但这不起作用 - 只是生成了一个空的 .csv
    • @BobSha,更新了我对 yaml 文件中没有 electiveGroup 部分的情况的回答,如果我的回答对您有帮助,那么请投票并接受它(您已经完成了):)
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-11-15
    • 1970-01-01
    • 1970-01-01
    • 2023-01-10
    • 1970-01-01
    相关资源
    最近更新 更多