【发布时间】:2020-12-07 04:22:04
【问题描述】:
我正在清理我的数据以获取用于将语言 X 转换为 Y 以进行机器翻译的文本对
[['\ufeffMensahe di Pasco di Gobernador di Aruba 2019',
'Governor’s Christmas speech 2019'],
['Gobernador di Aruba Sr. Alfonso Boekhoudt a duna su mensahe di Pasco riba 24 december ultimo',
'On Christams eve, December 24, the Governor of Aruba Mr. Alfonso Boekhoudt gave his traditional Christmas speech'],
['Por a wak e discurso di Pasco di Gobernador via e canalnan di television local',
"The governor's Christmas speech was shown at the local television stations"],......
以上是以下代码中的数据:
def clean_pairs(lines):
cleaned = list()
for pair in lines:
clean_pair = list()
for line in pair:
# normalize unicode characters
line = normalize('NFD', line).encode('ascii', 'ignore')
line = line.decode('UTF-8')
# tokenize on white space
line = line.split()
.
.
.
.
clean_pair.append(' '.join(line))
cleaned.append(clean_pair)
for i in range(10):
print('[%s]->[%s]' % (cleaned[i,0], cleaned[i,1]))
我应该得到如下输出:
[hi]->[hallo]
[hi]->[gru gott]
[run]->[lauf]
[wow]->[potzdonner]
[wow]->[donnerwetter]
但是,我收到以下错误:
索引错误
Traceback(最近一次调用 最后)在 49 50 for i in range(10): ---> 51 print('[%s]->[%s]' % (clean_pairs[i,0], clean_pairs[i,1]))IndexError: 数组的索引过多:数组是一维的,但是 2 被索引了
有人可以帮我解决问题吗? 谢谢!
【问题讨论】:
-
显然该列表是一维的。您可以原始打印整个列表吗?也许试试
clean_pairs[i][0]或者看看它是否是一个元组。
标签: python list data-structures