根据索引值对数据框列执行计算答案

【问题标题】：Perform calculations on dataframe column based on index value根据索引值对数据框列执行计算
【发布时间】：2019-11-07 06:15:14
【问题描述】：

我必须每月标准化一个数据框列Allocation 的值。

data=
                     Allocation  Temperature  Precipitation  Radiation
Date_From                                                             
2018-11-01 00:00:00    0.001905         9.55            0.0        0.0
2018-11-01 00:15:00    0.001794         9.55            0.0        0.0
2018-11-01 00:30:00    0.001700         9.55            0.0        0.0
2018-11-01 00:45:00    0.001607         9.55            0.0        0.0

这意味着，如果我们有 2018-11，将 Allocation 除以 11.116，而在 2018-12，将 Allocation 除以 2473.65，等等...（这些值来自列表 Volume，其中Volume[0] 对应于 2018-11，直到 Volume[7] 对应于 2019-06）。

Date_From 是一个索引和一个时间戳。

data_normalized=
                     Allocation  Temperature  Precipitation  Radiation
Date_From                                                             
2018-11-01 00:00:00    0.000171         9.55            0.0        0.0
2018-11-01 00:15:00    0.000097         9.55            0.0        0.0
...

我的方法是使用 itertuples：

for row in data.itertuples(index=True,name='index'):
    if row.index =='2018-11':
        data['Allocation']/Volume[0]

这里，if 语句永远不会为真......

另一种方法是 if ((row.index >='2018-11-01 00:00:00') & (row.index<='2018-11-31 23:45:00')): 但是，在这里我收到错误TypeError: '>=' not supported between instances of 'builtin_function_or_method' and 'str'

我可以用这种方法解决我的问题还是应该使用不同的方法？我很高兴有任何帮助

干杯！

【问题讨论】：

您介意分享您的预期输出吗？我不清楚你想要达到什么目标。你也可以读到mcve。
特别是不清楚Volume是什么。
我编辑了这个问题。卷是一个列表
您是否已经知道 Volume 在不同月份中至少有相同数量的元素？
如何猜测索引是字符串还是时间戳？我们如何猜测哪个月份对应于 Volume 列表中的哪个索引？ minimal reproducible example有什么你不明白的？

标签： python pandas dataframe datetime iteration

【解决方案1】：

也许您可以将您的列表 Volume 放在日期（或索引）为每个月的第一天的数据框中。

import pandas as pd
import numpy as np

N = 16
date = pd.date_range(start='2018-01-01', periods=N, freq="15d")
df = pd.DataFrame({"date":date, "Allocation":np.random.randn(N)})

# A dataframe where at every month associate a volume
df_vol = pd.DataFrame({"month":pd.date_range(start="2018-01-01", periods=8, freq="MS"),
                       "Volume": np.arange(8)+1})

# convert every date with the beginning of the month
df["month"] = df["date"].astype("datetime64[M]")

# merge
df1 = pd.merge(df,df_vol, on="month", how="left")

# divide allocation by Volume. 
# Now it's vectorial as to every date we merged the right volume.
df1["norm"] = df1["Allocation"]/df1["Volume"]

【讨论】：

@jonasa 在您的情况下，您必须将 Date_From 转换为 datatime 您可以使用 df["Date_From"]=df["Date_From"].astype("M8[us]")