如何在 python 列表中找到所有行的开始和结束索引答案

【问题标题】：How do I find start and end indices in python list for all the rows如何在 python 列表中找到所有行的开始和结束索引
【发布时间】：2021-11-02 18:26:34
【问题描述】：

我的代码 -

df=pd.read_csv("file")
l1=[]
l2=[]
for i in range(0,len(df['unions']),len(df['district'])):
    l1.append(' '.join((df['unions'][i], df['district'][i])))
    l2.append(({"entities": [[(ele.start(), ele.end() - 1) for ele in re.finditer(r'\S+', df['unions'][i])] ,df['subdistrict'][i]],}))

TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)

结果 - [('Dhansagar Bagerhat', {'entities': [[(0, 8)], 'Sarankhola']})]

我的预期输出 - [('Dhansagar Bagerhat', {'entities': [[(0, 8)], 'Sarankhola'],[[(10, 17)], 'AnyLabel']})] 如何获得所有行的输出？我只得到一排的结果。看来我的循环不起作用。谁能指出我的错误？

我的 csv 文件如下所示。 “AnyLabel”是另一列。我有大约 500 行 -

unions        subdistrict   district 
Dhansagar     Sarankhola    Bagerhat 
Daibagnyahati Morrelganj    Bagerhat 
Ramchandrapur Morrelganj    Bagerhat 
Kodalia       Mollahat      Bagerhat

【问题讨论】：

向我们展示原始数据框。
能否将其复制并粘贴为文本？
是的，我已经添加了文本格式

标签： python pandas list dataframe loops

【解决方案1】：

尝试使用str.join：

df=pd.read_csv("file")
l1=[]
l2=[]

for idx, row in df.iterrows():
    l1.append(' '.join((row['unions'], row['district'])))
    l2.append(({"entities": [[[ele.start(), ele.end() - 1], ele.group(0)] for ele in re.finditer(r'\S+', ' '.join([row['unions'] ,row['subdistrict']]))]}))
    

TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)

输出：

[('Dhansagar Bagerhat', {'entities': [[[0, 8], 'Dhansagar'], [[10, 19], 'Sarankhola']]}), ('Daibagnyahati Bagerhat', {'entities': [[[0, 12], 'Daibagnyahati'], [[14, 23], 'Morrelganj']]}), ('Ramchandrapur Bagerhat', {'entities': [[[0, 12], 'Ramchandrapur'], [[14, 23], 'Morrelganj']]}), ('Kodalia Bagerhat', {'entities': [[[0, 6], 'Kodalia'], [[8, 15], 'Mollahat']]})]

【讨论】：

在这里遇到同样的错误。 l1.append(' '.join((row['unions'], row['district']))) TypeError: tuple indices must be integers or slices, not str
实际上我又遇到了同样的错误。 l1.append(' '.join((row['unions'], row['district']))) TypeError: sequence item 0: expected str instance, float found
另外，你能看看我的预期输出吗？我想为这两个词都有索引
它适用于另一个 csv 文件。这是联合专栏的问题。谢谢
但我的预期输出不同

【解决方案2】：

您使用 range 错误，您基本上是在告诉它迭代从 0 到 len(df['unions']) 的所有数字，但要以相同长度的 len(df['district']) 的步骤进行。所以你基本上是在告诉它只遍历第一行。您可以通过打印行号来查看：

for i in range(0,len(df['unions']),len(df['district'])):
    print(i)

另外，你不应该像那样迭代行，而是使用df.iterrows()

df=pd.read_csv("file")
l1=[]
l2=[]

for i, row in df.iterrows():
    l1.append(' '.join((row['unions'], row['district'])))
    l2.append(({"entities": [[(ele.start(), ele.end() - 1) for ele in re.finditer(r'\S+', ' '.join([row['unions'] ,row['subdistrict']]))]]}))

【讨论】：

@U12-Forward 你能指出在哪里吗？我会解决的
花括号。
但是我得到这个错误 l1.append(' '.join((row['unions'], row['district']))) TypeError: tuple indices must be integers or slices,不是str
@bellatrix 对不起，我忘记了i，用编辑后的版本再试一次
但它是浮动的。 l1.append(' '.join((row['unions'], row['district']))) TypeError: sequence item 0: expected str instance, float found