这可能会有所帮助:
import yaml
import csv
yaml_file_names = ['data.yaml', 'data2.yaml']
rows_to_write = []
for idx, each_yaml_file in enumerate(yaml_file_names):
print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
with open(each_yaml_file) as f:
data = yaml.load(f)
for each_dict in data['degrees']:
for each_nested_dict in each_dict['electiveGroups']:
for each_option in each_nested_dict['options']:
# write to csv yaml_file_name, each_nested_dict['label'], each_option
rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
with open('output_csv_file.csv', 'w') as out:
csv_writer = csv.writer(out, delimiter='|')
csv_writer.writerows(rows_to_write)
print("Output file output_csv_file.csv created")
用两个模拟输入 yaml 的 data.yaml 和 data2.yaml 测试了这段代码,它们的内容是:
data.yaml:
code: 9313
degrees:
- name: Design
coreCourses:
- ABCD1
- ABCD2
- ABCD3
electiveGroups: #this is the section i need to extract
- label: Electives
options:
- Studio1
- Studio2
- Studio3
- label: OtherElectives
options:
- Class1
- Development2
- lateclass1
specialisations:
- label: Honours
和data2.yaml:
code: 9313
degrees:
- name: Design
coreCourses:
- ABCD1
- ABCD2
- ABCD3
electiveGroups: #this is the section i need to extract
- label: Electives
options:
- Studio1
- label: E2
options:
- Class1
specialisations:
- label: Honours
生成的输出 csv 文件是这样的:
data.yaml|Electives|Studio1
data.yaml|Electives|Studio2
data.yaml|Electives|Studio3
data.yaml|OtherElectives|Class1
data.yaml|OtherElectives|Development2
data.yaml|OtherElectives|lateclass1
data2.yaml|Electives|Studio1
data2.yaml|E2|Class1
顺便说一句,你在问题中提供的 yaml 输入,最后两行没有正确缩进
正如您所说,您需要解析一个目录中的 300 个 yaml 文件,那么您可以使用 python 的glob 模块,如下所示:
import yaml
import csv
import glob
yaml_file_names = glob.glob('./*.yaml')
# yaml_file_names = ['data.yaml', 'data2.yaml']
rows_to_write = []
for idx, each_yaml_file in enumerate(yaml_file_names):
print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
with open(each_yaml_file) as f:
data = yaml.load(f)
for each_dict in data['degrees']:
for each_nested_dict in each_dict['electiveGroups']:
for each_option in each_nested_dict['options']:
# write to csv yaml_file_name, each_nested_dict['label'], each_option
rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
with open('output_csv_file.csv', 'w') as out:
csv_writer = csv.writer(out, delimiter='|', quotechar=' ')
csv_writer.writerows(rows_to_write)
print("Output file output_csv_file.csv created")
编辑:正如您在 cmets 中要求跳过那些没有 electiveGroup 部分的 yaml 文件,这是更新后的程序:
import yaml
import csv
import glob
yaml_file_names = glob.glob('./*.yaml')
# yaml_file_names = ['data.yaml', 'data2.yaml']
rows_to_write = []
for idx, each_yaml_file in enumerate(yaml_file_names):
print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
with open(each_yaml_file) as f:
data = yaml.load(f)
for each_dict in data['degrees']:
try:
for each_nested_dict in each_dict['electiveGroups']:
for each_option in each_nested_dict['options']:
# write to csv yaml_file_name, each_nested_dict['label'], each_option
rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
except KeyError:
print("No electiveGroups or options key found in", each_yaml_file)
with open('output_csv_file.csv', 'w') as out:
csv_writer = csv.writer(out, delimiter='|', quotechar=' ')
csv_writer.writerows(rows_to_write)
print("Output file output_csv_file.csv created")