【问题标题】:Python for loops and comprehension for loopsPython for 循环和对循环的理解
【发布时间】:2017-09-11 14:30:00
【问题描述】:

有人可以向我解释为什么这两个语句(for 循环和理解)会返回两个不同的答案。我认为它们是相同的,只是写声明的方式不同。

使用的数据:

Top152['% Renewable'] 
Country
China                 19.754910
United States         11.570980
Japan                 10.232820
United Kingdom        10.600470
Russian Federation    17.288680
Canada                61.945430
Germany               17.901530
India                 14.969080
France                17.020280
South Korea            2.279353
Italy                 33.667230
Spain                 37.968590
Iran                   5.707721
Australia             11.810810
Brazil                69.648030

for循环:

def answer_ten():
    Top15 = answer_one()
    Top152 = Top15.copy()

    for x in Top152['% Renewable']:
        if x >= Top152['% Renewable'].median():
            Top152['HighRenew'] = 1
        else:
            Top152['HighRenew'] = 0
return Top152['HighRenew']
    answer_ten()

输出:

    Country
    China                 1
    United States         1
    Japan                 1
    United Kingdom        1
    Russian Federation    1
    Canada                1
    Germany               1
    India                 1
    France                1
    South Korea           1
    Italy                 1
    Spain                 1
    Iran                  1
    Australia             1
    Brazil                1     

理解:

def answer_ten():
Top15 = answer_one()
Top152 = Top15.copy()

    Top152['HighRenew'] = [1 if x >= Top152['% Renewable'].median() else 0 for x in Top152['% Renewable']]


return Top152['HighRenew']
answer_ten()

输出:

Country
China                 1
United States         0
Japan                 0
United Kingdom        0
Russian Federation    1
Canada                1
Germany               1
India                 0
France                1
South Korea           0
Italy                 1
Spain                 1
Iran                  0
Australia             0
Brazil                1

【问题讨论】:

    标签: python pandas numpy for-loop list-comprehension


    【解决方案1】:

    您在每个迭代步骤中设置整个列(向量):

    Top152['HighRenew'] = 1
    

    改用这种矢量化方法:

    Top152['HighRenew'] = (Top152['% Renewable'] >= Top152['% Renewable'].median()).astype(int)
    

    所以你的函数可以实现如下:

    def answer_ten():
        return (Top15['% Renewable'] >= Top15['% Renewable'].median()).astype(int)
    

    【讨论】:

      【解决方案2】:

      最好将boolean mask 转换为int,因为pandas 使用非常快速的矢量化函数最快:

      print (Top152['% Renewable']> Top152['% Renewable'].median())
      China                  True
      United States         False
      Japan                 False
      United Kingdom        False
      Russian Federation     True
      Canada                 True
      Germany                True
      India                 False
      France                False
      South Korea           False
      Italy                  True
      Spain                  True
      Iran                  False
      Australia             False
      Brazil                 True
      Name: % Renewable, dtype: bool
      

      def answer_ten():
          return (Top152['% Renewable'] > Top152['% Renewable'].median())
                  .astype(int).rename('HighRenew')
      
      
      print (answer_ten())
      China                 1
      United States         0
      Japan                 0
      United Kingdom        0
      Russian Federation    1
      Canada                1
      Germany               1
      India                 0
      France                0
      South Korea           0
      Italy                 1
      Spain                 1
      Iran                  0
      Australia             0
      Brazil                1
      Name: HighRenew, dtype: int32
      

      对于循环,使用iterrows 可以解决非常慢的解决方案,但第一个解决方案是更快:

      def answer_ten():
          for idx, x in Top152.iterrows():
              if Top152.loc[idx, '% Renewable'] >= Top152['% Renewable'].median():
                  Top152.loc[idx, 'HighRenew'] = 1
              else:
                  Top152.loc[idx, 'HighRenew'] = 0
          return Top152['HighRenew'].astype(int)
      
      print (answer_ten())
      China                 1
      United States         0
      Japan                 0
      United Kingdom        0
      Russian Federation    1
      Canada                1
      Germany               1
      India                 0
      France                1
      South Korea           0
      Italy                 1
      Spain                 1
      Iran                  0
      Australia             0
      Brazil                1
      Name: HighRenew, dtype: int32
      

      时间安排

      #[15000 rows x 1 columns]
      Top152 = pd.concat([Top152]*1000).reset_index(drop=True)  
      
      def answer_ten1():
          return (Top152['% Renewable']> Top152['% Renewable'].median()).astype(int).rename('HighRenew')
      
      def answer_ten2():
          for idx, x in Top152.iterrows():
              if Top152.loc[idx, '% Renewable'] >= Top152['% Renewable'].median():
                  Top152.loc[idx, 'HighRenew'] = 1
              else:
                  Top152.loc[idx, 'HighRenew'] = 0
          return Top152['HighRenew'].astype(int)
      
      
      def answer_ten3():
          Top152['HighRenew'] = [1 if x >= Top152['% Renewable'].median() else 0 for x in Top152['% Renewable']]
          return Top152['HighRenew']
      
      print (answer_ten1())   
      print (answer_ten2())
      print (answer_ten3())  
      

      In [169]: %timeit (answer_ten1())
      1000 loops, best of 3: 528 µs per loop
      
      In [170]: %timeit answer_ten2()
      1 loop, best of 3: 16 s per loop
      
      In [171]: %timeit (answer_ten3())
      1 loop, best of 3: 2.67 s per loop
      

      【讨论】:

        【解决方案3】:

        在第二种方法中,您正在编辑矢量。而 for 循环将保存它(在后台)以避免不必要的编辑!

        【讨论】:

          猜你喜欢
          • 2019-05-01
          • 1970-01-01
          • 1970-01-01
          • 2013-10-15
          • 2023-03-11
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多