pandas multiindex（分层索引）减去列并附加结果答案

【问题标题】：pandas multiindex (hierarchical index) subtract columns and append resultpandas multiindex（分层索引）减去列并附加结果
【发布时间】：2021-04-14 08:08:04
【问题描述】：

我有这段代码运行得很好......

from pandas_datareader import data as pdr
import pandas as pd

myStartDate = '2020-12-15'   # (Format: Year-Month-Day)
myEndDate = '2020-12-31'   # (Format: Year-Month-Day)
myTickers = 'TSLA'
myData = pdr.get_data_yahoo(myTickers, myStartDate, myEndDate)

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 200)

print(myData.head())

结果是：

           High     Low    Open    Close   Volume    Adj Close
Date                                                                             
20-12-15  646.90  623.79  643.28  633.25  45223600  633.250000
20-12-16  632.50  605.00  628.22  622.77  42095800  622.770020
20-12-17  658.82  619.50  628.19  655.90  56270100  655.900024
20-12-18  695.00  628.53  668.90  695.00  22212620  695.000000
20-12-21  668.50  646.07  666.23  649.85  58045300  649.859985

然后，我有这个：

myData['Some New Column Name'] = myData['High'] - myData['Low']
print(myData.head())

而且效果很好，结果是这样的：

           High     Low    Open    Close   Volume   Adj Close   Some New Column Name
Date                                                                                                   
20-12-15  646.90  623.79  643.28  633.25   45223600  633.25            23.10
20-12-16  632.50  605.00  628.22  622.77   42095800  622.77            27.50
20-12-17  658.82  619.50  628.19  655.90   56270100  655.90            39.32
20-12-18  695.00  628.53  668.90  695.00   22212620  695.00            66.46
20-12-21  668.50  646.07  666.23  649.85   58045300  649.85            22.42

我想修改代码以执行多个“myTickers”，如下所示：

from pandas_datareader import data as pdr
import pandas as pd

myStartDate = '2020-12-15'   # (Format: Year-Month-Day)
myEndDate = '2020-12-31'   # (Format: Year-Month-Day)
myTickers = ['TSLA', 'SQ']
myData = pdr.get_data_yahoo(myTickers, myStartDate, myEndDate)

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 200)

print(myData.head())

这很好用，结果是：

Attributes  Adj Close         Close            High            Low             Open           Volume    
Symbols    TSLA     SQ     TSLA     SQ     TSLA     SQ     TSLA     SQ     TSLA     SQ      TSLA    SQ
Date                                                
12/15/20  633.25  219.99  633.25  219.99  646.90  221.72  623.80  216.73  643.28  218.41  45223600  5713600
12/16/20  622.77  227.08  622.77  227.08  632.50  227.96  605.00  220.03  628.23  223.86  42095800  8216500
12/17/20  655.90  230.74  655.90  230.74  658.82  237.09  619.50  227.70  628.19  230.00  56270100  10440100
12/18/20  695.00  235.45  695.00  235.45  695.00  236.37  628.54  231.10  668.90  235.00  22212620  8099500
12/21/20  649.86  233.50  649.86  233.50  668.50  241.85  646.07  232.26  666.24  236.09  58045300  1115540

类似于第一个例子，我想做：（特斯拉的高）-（特斯拉的低）=（一些新的列名）和（SQ的高）-（SQ的低）=（一些新的列名）但我是熊猫对如何做到这一点不是很有经验。我的理解是这是一个多索引或（分层索引）熊猫数据框。阅读一些文档，我认为这样的方法可能有效：

myData['Some New Column Name']['TSLA', 'SQ'] = myData['High'] - myData['Low']

或

myData[['Some New Column Name'],['TSLA', 'SQ']] = myData['High'] - myData['Low']

但这不起作用......我觉得它可能很接近？

说清楚，我想要这样的东西：

Attributes  Adj Close         Close       ...       Open            Volume      Some New Col
Symbols    TSLA     SQ     TSLA     SQ    ...   TSLA     SQ      TSLA     SQ     TSLA    SQ
Date                                      ...           
12/15/20  633.25  219.99  633.25  219.99  ...  643.28  218.41  45223600 5713600  23.10  4.99
12/16/20  622.77  227.08  622.77  227.08  ...  628.23  223.86  42095800 8216500  27.50  7.93
12/17/20  655.90  230.74  655.90  230.74  ...  628.19  230.00  56270100 1044010  39.32  9.39
12/18/20  695.00  235.45  695.00  235.45  ...  668.90  235.00  22212620 8099500  66.46  5.27
12/21/20  649.86  233.50  649.86  233.50  ...  666.24  236.09  58045300 1115540  22.43  9.59

我刚试过：

myNewData = myData['High'] - myData['Low']

结果是这样的：

Symbols          TSLA        SQ
Date                           
2020-12-15  23.100037  4.990005
2020-12-16  27.500000  7.930008
2020-12-17  39.320007  9.389999
2020-12-18  66.460022  5.269989
2020-12-21  22.429993  9.590012

我可能可以使用这个...但是...以某种干净的方式将它附加到 myData 会很好

【问题讨论】：

标签： python pandas append multi-index hierarchical

【解决方案1】：

我认为你很接近......

myData[[('Some new column', c) for c in myData['High'].columns]] = myData['High'] - myData['Low']

【讨论】：

呃...是否可以自动填充“SQ”和“TSLA”，因为它已经在“myData”数据框中？或者以某种方式使用“myTickers”列表？因为，我只使用这 2 个（'SQ' 和 'TSLA'）来简化事情......实际上，我可能有 30 个？或者更多？像这样一遍又一遍地单独键入它们会很糟糕……一定有更好的方法吗？但是，谢谢您的回答，它肯定会让我暂时满意。
您可以动态创建要添加的列的名称（即[('Some New Column Name', 'SQ'), ('Some New Column Name', 'TSLA')]），遍历所有可用的列，您可以从High 或@987654324 的第二级获取@（如果我正确理解了您的问题）。
我认为您理解正确。你能发布一个这种循环的例子吗？
绝对漂亮！！非常感谢！！