【发布时间】:2020-11-02 16:34:07
【问题描述】:
我有一个列表格式的 NER 数据。
样本数据:
[[('Silica', 'NN', '_', 'B-Material'),
('nanoparticles', 'NNS', '_', 'I-Material'),
('possessing', 'VBG', '_', 'O'),
('three', 'CD', '_', 'B-Data'),
('different', 'JJ', '_', 'I-Data'),
('diameters', 'NNS', '_', 'I-Data'),
('(', '(', '_', 'I-Data'),
('23', 'CD', '_', 'I-Data'),
(',', ',', '_', 'I-Data'),
('74', 'CD', '_', 'I-Data'),
('and', 'CC', '_', 'I-Data'),
('170', 'CD', '_', 'I-Data'),
('nm', 'NN', '_', 'I-Data'),
(')', ')', '_', 'I-Data'),
('were', 'VBD', '_', 'O'),
('used', 'VBN', '_', 'O'),
('to', 'TO', '_', 'O'),
('modify', 'NN', '_', 'B-Process'),
('a', 'DT', '_', 'B-Material'),
('piperidine', 'NN', '_', 'I-Material'),
('-', ':', '_', 'I-Material'),
('cured', 'VBN', '_', 'I-Material'),
('epoxy', 'NN', '_', 'I-Material'),
('polymer', 'NN', '_', 'I-Material'),
('.', '.', '_', 'O')],
[('Fracture', 'NN', '_', 'B-Process'),
('tests', 'NNS', '_', 'I-Process'),
('were', 'VBD', '_', 'O'),
('performed', 'VBN', '_', 'B-Process'),
('and', 'CC', '_', 'O'),
('values', 'NNS', '_', 'B-Data'),
('of', 'IN', '_', 'I-Data'),
('the', 'DT', '_', 'I-Data'),
('toughness', 'NN', '_', 'I-Data'),
('increased', 'VBN', '_', 'B-Process'),
('steadily', 'RB', '_', 'I-Process'),
('as', 'IN', '_', 'O'),
('the', 'DT', '_', 'B-Data'),
('concentration', 'NN', '_', 'I-Data'),
('of', 'IN', '_', 'O'),
('silica', 'NN', '_', 'B-Material'),
('nanoparticles', 'NNS', '_', 'I-Material'),
('was', 'VBD', '_', 'O'),
('increased', 'VBN', '_', 'B-Process'),
('.', '.', '_', 'O')]]
我需要将其转换为 CoNLL-2003 NER 数据格式并将其保存在文本文件中。我实现的代码没有按预期工作。我的实现:
name= 'coll2003_train_com.txt'
def data_format(name, seq):
test = []
for i in seq:
for j in i:
test.append(j)
with open(name, 'w', encoding="utf-8") as f1:
for i in test:
ii='\t'.join(i)
f1.writelines(ii + '/n')
#f1.writelines('/n')
return test
m=data_format(name, cc1)
结果以一个句子而不是单独的行保存在文本文件中。
【问题讨论】:
-
预期输出是什么,请添加示例。
标签: python tagging named-entity-recognition