【问题标题】:Math operations in multiindiex dataframe with .loc具有 .loc 的多索引数据帧中的数学运算
【发布时间】:2021-09-29 12:43:00
【问题描述】:

我有一个多索引df

                    created_at          2020-06-29                                       2020-07-06
                                        sales   orders  last_sales      differennce      sales  orders  last_sales      differennce

group               category            10      10      10              0                10      10      20             50  
A                   a1
                    a2
                    a3
B                   a1                  ...     ...     ...            ...               ...     ...     ...
                    a2
                    a3
all                 Total               100     100     100            0                 150     150     150            0

我正在尝试计算all & Total 索引中的difference ((sales / last_sales) -1) * 100 其中created_at2020-06-29 to 2021-07-19

我试过了:

df.loc[('all','Total'),(slice(None),'difference')] = 
((df.loc[('all','Total'),(slice(None),'sales')] / 
df.loc[('all','Total'),:'2021-07-19','last_sales']) - 1) * 100

但我得到一个错误:

IndexError: 列表索引超出范围

我似乎无法选择我想要的 created_at 范围,但这有效:

df.loc[:,:'2021-07-19'] # returning the dates I want, from first to selected

但是当我尝试选择一行时,它会中断:

df_out.loc[:,(:'2021-07-19','sales')] # SyntaxError: invalid syntax

&

df_out.loc[:,:'2021-07-19','sales'] # IndexError: list index out of range

如何选择group = allcategory = Total 和日期来自2020-06-29 to 2020-07-19 的所有行?

我也试过了:

df.loc[('all','Total'),(slice(None),'difference')] = 
((df.loc[('all','Total'),(slice(None),'sales')] / 
df.loc[('all','Total'),(slice(None),'last_sales')]) - 1) * 100 # slice(None) instead of date

但这让我很困惑ZeroDivisionError: float division by zero,因为last_sales = 0 上面的日期有2021-07-19。也许还有其他处理0s的方法?我尝试像这样添加1 df.loc[('all','Total'),(slice(None),'last_sales')]+1) - 1) * 100 但我仍然得到同样的错误。

例如df:

df = pd.DataFrame.from_dict({('group', ''): {0: 'A',
  1: 'A',
  2: 'A',
  3: 'A',
  4: 'A',
  5: 'A',
  6: 'A',
  7: 'A',
  8: 'A',
  9: 'B',
  10: 'B',
  11: 'B',
  12: 'B',
  13: 'B',
  14: 'B',
  15: 'B',
  16: 'B',
  17: 'B',
  18: 'all',
  19: 'all'},
 ('category', ''): {0: 'Amazon',
  1: 'Apple',
  2: 'Facebook',
  3: 'Google',
  4: 'Netflix',
  5: 'Tesla',
  6: 'Total',
  7: 'Uber',
  8: 'total',
  9: 'Amazon',
  10: 'Apple',
  11: 'Facebook',
  12: 'Google',
  13: 'Netflix',
  14: 'Tesla',
  15: 'Total',
  16: 'Uber',
  17: 'total',
  18: 'Total',
  19: 'total'},
 (pd.Timestamp('2020-06-29 00:00:00'), 'last_sales'): {0: 195.0,
  1: 61.0,
  2: 106.0,
  3: 61.0,
  4: 37.0,
  5: 13.0,
  6: 954.0,
  7: 4.0,
  8: 477.0,
  9: 50.0,
  10: 50.0,
  11: 75.0,
  12: 43.0,
  13: 17.0,
  14: 14.0,
  15: 504.0,
  16: 3.0,
  17: 252.0,
  18: 2916.0,
  19: 2916.0},
 (pd.Timestamp('2020-06-29 00:00:00'), 'sales'): {0: 1268.85,
  1: 18274.385000000002,
  2: 19722.65,
  3: 55547.255,
  4: 15323.800000000001,
  5: 1688.6749999999997,
  6: 227463.23,
  7: 1906.0,
  8: 113731.615,
  9: 3219.6499999999996,
  10: 15852.060000000001,
  11: 17743.7,
  12: 37795.15,
  13: 5918.5,
  14: 1708.75,
  15: 166349.64,
  16: 937.01,
  17: 83174.82,
  18: 787625.7400000001,
  19: 787625.7400000001},
 (pd.Timestamp('2020-06-29 00:00:00'), 'difference'): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  9: 0.0,
  10: 0.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 0.0,
  16: 0.0,
  17: 0.0,
  18: 0.0,
  19: 0.0},
 (pd.Timestamp('2020-07-06 00:00:00'), 'last_sales'): {0: 26.0,
  1: 39.0,
  2: 79.0,
  3: 49.0,
  4: 10.0,
  5: 10.0,
  6: 436.0,
  7: 5.0,
  8: 218.0,
  9: 89.0,
  10: 34.0,
  11: 133.0,
  12: 66.0,
  13: 21.0,
  14: 20.0,
  15: 732.0,
  16: 3.0,
  17: 366.0,
  18: 2336.0,
  19: 2336.0},
 (pd.Timestamp('2020-07-06 00:00:00'), 'sales'): {0: 3978.15,
  1: 12138.96,
  2: 19084.175,
  3: 40033.46000000001,
  4: 4280.15,
  5: 1495.1,
  6: 165548.29,
  7: 1764.15,
  8: 82774.145,
  9: 8314.92,
  10: 12776.649999999996,
  11: 28048.075,
  12: 55104.21000000002,
  13: 6962.844999999999,
  14: 3053.2000000000003,
  15: 231049.11000000002,
  16: 1264.655,
  17: 115524.55500000001,
  18: 793194.8000000002,
  19: 793194.8000000002},
 (pd.Timestamp('2020-07-06 00:00:00'), 'difference'): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  9: 0.0,
  10: 0.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 0.0,
  16: 0.0,
  17: 0.0,
  18: 0.0,
  19: 0.0},
 (pd.Timestamp('2021-06-28 00:00:00'), 'last_sales'): {0: 96.0,
  1: 56.0,
  2: 106.0,
  3: 44.0,
  4: 34.0,
  5: 13.0,
  6: 716.0,
  7: 9.0,
  8: 358.0,
  9: 101.0,
  10: 22.0,
  11: 120.0,
  12: 40.0,
  13: 13.0,
  14: 8.0,
  15: 610.0,
  16: 1.0,
  17: 305.0,
  18: 2652.0,
  19: 2652.0},
 (pd.Timestamp('2021-06-28 00:00:00'), 'sales'): {0: 5194.95,
  1: 19102.219999999994,
  2: 22796.420000000002,
  3: 30853.115,
  4: 11461.25,
  5: 992.6,
  6: 188143.41,
  7: 3671.15,
  8: 94071.705,
  9: 6022.299999999998,
  10: 7373.6,
  11: 33514.0,
  12: 35943.45,
  13: 4749.000000000001,
  14: 902.01,
  15: 177707.32,
  16: 349.3,
  17: 88853.66,
  18: 731701.46,
  19: 731701.46},
 (pd.Timestamp('2021-06-28 00:00:00'), 'difference'): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  9: 0.0,
  10: 0.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 0.0,
  16: 0.0,
  17: 0.0,
  18: 0.0,
  19: 0.0},
 (pd.Timestamp('2021-07-07 00:00:00'), 'last_sales'): {0: 45.0,
  1: 47.0,
  2: 87.0,
  3: 45.0,
  4: 13.0,
  5: 8.0,
  6: 494.0,
  7: 2.0,
  8: 247.0,
  9: 81.0,
  10: 36.0,
  11: 143.0,
  12: 56.0,
  13: 9.0,
  14: 9.0,
  15: 670.0,
  16: 1.0,
  17: 335.0,
  18: 2328.0,
  19: 2328.0},
 (pd.Timestamp('2021-07-07 00:00:00'), 'sales'): {0: 7556.414999999998,
  1: 14985.05,
  2: 16790.899999999998,
  3: 36202.729999999996,
  4: 4024.97,
  5: 1034.45,
  6: 163960.32999999996,
  7: 1385.65,
  8: 81980.16499999998,
  9: 5600.544999999999,
  10: 11209.92,
  11: 32832.61,
  12: 42137.44500000001,
  13: 3885.1499999999996,
  14: 1191.5,
  15: 194912.34000000003,
  16: 599.0,
  17: 97456.17000000001,
  18: 717745.3400000001,
  19: 717745.3400000001},
 (pd.Timestamp('2021-07-07 00:00:00'), 'difference'): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  9: 0.0,
  10: 0.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 0.0,
  16: 0.0,
  17: 0.0,
  18: 0.0,
  19: 0.0}}).set_index(['group','category'])

【问题讨论】:

  • 您能否提供一个简单的复制和粘贴数据框,尤其是使用 MultiIndex 时,最好将数据提供为pd.DataFrame..
  • differennce 中的错字是否也出现在您的数据中?
  • 添加了一个具有相同结构的示例df

标签: python pandas indexing


【解决方案1】:

我们可以使用pd.IndexSlice,并且我们需要对pd.Timestamp 进行切片,因为您的第一级列是日期时间类型:

idx = pd.IndexSlice
sales = df.loc[('all','Total'),(slice(None),'sales')].droplevel(level=1)
last_sales = df.loc[('all','Total'), :pd.Timestamp("2021-07-07")].loc[idx[:, "last_sales"]]

df.loc[('all','Total'),(slice(None),'difference')] = sales.div(last_sales).sub(1).mul(100).to_numpy()
               2020-06-29 00:00:00                      2020-07-06 00:00:00                      2021-06-28 00:00:00                      2021-07-07 00:00:00                     
                        last_sales     sales difference          last_sales     sales difference          last_sales     sales difference          last_sales     sales difference
group category                                                                                                                                                                    
A     Amazon                195.00   1268.85       0.00               26.00   3978.15       0.00               96.00   5194.95       0.00               45.00   7556.41       0.00
      Apple                  61.00  18274.39       0.00               39.00  12138.96       0.00               56.00  19102.22       0.00               47.00  14985.05       0.00
      Facebook              106.00  19722.65       0.00               79.00  19084.17       0.00              106.00  22796.42       0.00               87.00  16790.90       0.00
      Google                 61.00  55547.25       0.00               49.00  40033.46       0.00               44.00  30853.12       0.00               45.00  36202.73       0.00
      Netflix                37.00  15323.80       0.00               10.00   4280.15       0.00               34.00  11461.25       0.00               13.00   4024.97       0.00
      Tesla                  13.00   1688.67       0.00               10.00   1495.10       0.00               13.00    992.60       0.00                8.00   1034.45       0.00
      Total                 954.00 227463.23       0.00              436.00 165548.29       0.00              716.00 188143.41       0.00              494.00 163960.33       0.00
      Uber                    4.00   1906.00       0.00                5.00   1764.15       0.00                9.00   3671.15       0.00                2.00   1385.65       0.00
      total                 477.00 113731.62       0.00              218.00  82774.15       0.00              358.00  94071.71       0.00              247.00  81980.16       0.00
B     Amazon                 50.00   3219.65       0.00               89.00   8314.92       0.00              101.00   6022.30       0.00               81.00   5600.54       0.00
      Apple                  50.00  15852.06       0.00               34.00  12776.65       0.00               22.00   7373.60       0.00               36.00  11209.92       0.00
      Facebook               75.00  17743.70       0.00              133.00  28048.08       0.00              120.00  33514.00       0.00              143.00  32832.61       0.00
      Google                 43.00  37795.15       0.00               66.00  55104.21       0.00               40.00  35943.45       0.00               56.00  42137.45       0.00
      Netflix                17.00   5918.50       0.00               21.00   6962.84       0.00               13.00   4749.00       0.00                9.00   3885.15       0.00
      Tesla                  14.00   1708.75       0.00               20.00   3053.20       0.00                8.00    902.01       0.00                9.00   1191.50       0.00
      Total                 504.00 166349.64       0.00              732.00 231049.11       0.00              610.00 177707.32       0.00              670.00 194912.34       0.00
      Uber                    3.00    937.01       0.00                3.00   1264.65       0.00                1.00    349.30       0.00                1.00    599.00       0.00
      total                 252.00  83174.82       0.00              366.00 115524.56       0.00              305.00  88853.66       0.00              335.00  97456.17       0.00
all   Total                2916.00 787625.74   26910.48             2336.00 793194.80   33855.26             2652.00 731701.46   27490.55             2328.00 717745.34   30730.99
      total                2916.00 787625.74       0.00             2336.00 793194.80       0.00             2652.00 731701.46       0.00             2328.00 717745.34       0.00

【讨论】:

  • 不得不用sales.div(last_sales.add(1).sub(1).mul(100).to_numpy()来处理0,谢谢!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-08-03
  • 2018-05-14
  • 2018-05-24
  • 2021-12-13
  • 2019-11-02
相关资源
最近更新 更多