【发布时间】:2019-02-24 08:58:52
【问题描述】:
我得到了一个生成器对象,它基本上由嵌套列表组成。 它包含大约 20.000 个列表,结构如下所示:
cases = [[0,36,12],[64,28,1],....
此对象中的每个列表代表属于一个进程的行。现在我想将 ProcessID 分配给数据帧的相应行。在我使用 for 循环实现这一点的那一刻:
moc = df.iloc
processID = 0
for process in cases:
for step in process:
moc[process,-1] = processID
processID += 1
尽管这可行,但遍历 for 循环需要很长时间,因此我对分配 processID 的更有效方法感兴趣。
由于我需要遍历 case 对象,并且由于嵌套列表的长度不同,我不知道如何实现更高效的过程,例如 df.apply() 或 np.where()。
感谢任何帮助。
例子:
import pandas as pd
import numpy as np
cases = [[1,4,2],[3,5,0],[9,6],[7,8]]
d = {'col1': ["some_information", "some_information","some_information",
"some_information","some_information","some_information",
"some_information","some_information","some_information",
"some_information"],
'processID':np.empty}
df = pd.DataFrame(data=d)
print(df)
col1 processID
0 some_information <built-in function empty>
1 some_information <built-in function empty>
2 some_information <built-in function empty>
3 some_information <built-in function empty>
4 some_information <built-in function empty>
5 some_information <built-in function empty>
6 some_information <built-in function empty>
7 some_information <built-in function empty>
8 some_information <built-in function empty>
9 some_information <built-in function empty>
moc = df.iloc
processID = 1
for case in cases:
for idx in case:
moc[idx,-1] = processID
processID += 1
print(df)
col1 processID
0 some_information 2
1 some_information 1
2 some_information 1
3 some_information 2
4 some_information 1
5 some_information 2
6 some_information 3
7 some_information 4
8 some_information 4
9 some_information 3
【问题讨论】:
-
您能否提供一个最小的可重现示例。
标签: pandas for-loop dataframe processing-efficiency