使用 loc 命令在 Python 中进行循环优化答案

【问题标题】：Loop optimization in Python using loc command使用 loc 命令在 Python 中进行循环优化
【发布时间】：2016-10-24 17:34:32
【问题描述】：

我有一段python代码如图：

# Main Loop that take values attributed to the row by row basis and sorts
# them into correpsonding columns based on matching the 'Name' and the newly
# generated column names.
listed_names=list(df_cv) #list of column names to reference later.
variable=listed_names[3:] #List of the 3rd to the last column. Column 1&2 are irrelevant.
for i in df_cv.index: #For each index in the Dataframe (DF)
     for m in variable: #For each variable in the list of variable column names
            if df_cv.loc[i,'Name']==m: #If index location in variable name is equal to the variable column name...
                df_cv.loc[i,m]=df_cv.loc[i,'Value'] #...Then that location is equal to the value in same row under the column 'Value'

基本上，它需要一个 3xn 的时间/名称/值列表，并按 unique(n) 将其排序为大小为 n 的 pandas df。

Time   Name    Value
1      Color   Red
2      Age     6
3      Temp    25
4      Age     1

进入这个：

Time   Color   Age    Temp
1      Red     
2              6
3                     25
4              1

我的代码需要很长时间才能运行，我想知道是否有更好的方法来设置我的循环。我来自 MATLAB 背景，所以 python 的风格（即 everything 不使用行/列仍然是陌生的）。

如何让这部分代码运行得更快？

【问题讨论】：

标签： python for-loop pandas optimization

【解决方案1】：

与其循环，不如将其视为枢轴操作。假设 Time 是一列而不是索引（如果是，则使用reset_index）：

In [96]: df
Out[96]: 
   Time   Name Value
0     1  Color   Red
1     2    Age     6
2     3   Temp    25
3     4    Age     1

In [97]: df.pivot(index="Time", columns="Name", values="Value")
Out[97]: 
Name   Age Color  Temp
Time                  
1     None   Red  None
2        6  None  None
3     None  None    25
4        1  None  None

In [98]: df.pivot(index="Time", columns="Name", values="Value").fillna("")
Out[98]: 
Name Age Color Temp
Time               
1          Red     
2      6           
3                25
4      1

这在真实数据集上应该更快，并且更易于启动。

【讨论】：