正如我在评论中建议的那样,我找到了比break 语句更好的解决方案:
您可以创建result 列表并将每个块数据存储在列表的单独元素中(例如,在字典中)。如果您读取非Header行,则可以保证您刚刚读取的行与当前数据块相关。而当前的数据块是result 列表中的最后一个元素,所以你可以修改它。如果您阅读 Header 行,您只需将新元素附加到 result 并开始将新的块数据写入其中。
如果内容的大小是恒定的,您可以使用 itertools.cycle 迭代器来“编码”您的解析过程:
from itertools import cycle
text1 = """Header1
number of Samples1
Content1
a1, aa1, aaa1
b1, bb1, bbb1
Header2
number of Samples2
Content2
a2, aa2, aaa2
b2, bb2, bbb2"""
size = 5
iterator = cycle(range(size))
result = []
for line in text1.split('\n'):
i = next(iterator)
if i == 0:
result.append({'header': line})
elif i == 1:
result[-1]['num_of_samples'] = line
elif i == 2:
result[-1]['content_header'] = line
elif i == 3:
result[-1]['content'] = [line.split(', ')]
else:
result[-1]['content'].append(line.split(', '))
如果你不知道内容的大小,你应该解析每一行,检查它的类型并手动构建你的数据:
text2 = """Header1
number of Samples1
Content1
a1, aa1, aaa1
b1, bb1, bbb1
b1, bb1, bbb1
Header2
number of Samples2
Content2
b2, bb2, bbb2
Header3
number of Samples3
Content3
a3, aa3, aaa3
b3, bb3, bbb3"""
result = []
i = 0
for line in text2.split('\n'):
if line.startswith('Header'): # Your condition for headers
result.append({'header': line})
elif line.startswith('number'): # Your condition for number of samples
result[-1]['num_of_samples'] = line
elif line.startswith('Content'): # Your condition for content headers
result[-1]['content_header'] = line
else:
if 'content' not in result[-1]: # We don't know is the content list created
result[-1]['content'] = [line.split(', ')]
else:
result[-1]['content'].append(line.split(', '))