【问题标题】:how to expand the below dataframs in pandas or numpy based on interval_start and interval_end如何根据间隔开始和间隔结束在 pandas 或 numpy 中扩展以下数据帧
【发布时间】:2021-05-14 03:08:15
【问题描述】:

如何根据 interval_startinterval_end

在 pandas 或 numpy 中扩展以下数据帧

我尝试过的例子很少,但在一个月的最后几天都没有。

输入 DF:

+--------------+------------+----+---+
|interval_start|interval_end|name|val|
+--------------+------------+----+---+
|2018-10-31    |2020-09-05  | abc|1  |
|2020-09-05    |2020-10-05  | abc|1  |
|2020-01-31    |2020-04-30  | def|2  |
+--------------+------------+----+---+

从输入 DF 拆分数据帧,基于两列 interval_start 和 interval_end 以及它们之间的日期序列,如输出 DF 所示

输出DF:

+--------------+------------+----+---+
|interval_start|interval_end|name|val|
+--------------+------------+----+---+
|2018-10-31    |2018-11-30  | abc|1  |
|2018-11-30    |2018-12-31  | abc|1  |
|2018-12-31    |2019-01-31  | abc|1  |
|2019-01-31    |2019-02-28  | abc|1  |
|2019-02-28    |2019-03-31  | abc|1  |
|2019-03-31    |2019-04-30  | abc|1  |
|2019-04-30    |2019-05-31  | abc|1  |
|2019-05-31    |2019-06-30  | abc|1  |
|2019-06-30    |2019-07-31  | abc|1  |
|2019-07-31    |2019-08-31  | abc|1  |
|2019-08-31    |2019-09-30  | abc|1  |
|2019-09-30    |2019-10-31  | abc|1  |
|2019-10-31    |2019-11-30  | abc|1  |
|2019-11-30    |2019-12-31  | abc|1  |
|2019-12-31    |2020-01-31  | abc|1  |
|2020-01-31    |2020-02-29  | abc|1  |
|2020-02-29    |2020-03-31  | abc|1  |
|2020-03-31    |2020-04-30  | abc|1  |
|2020-04-30    |2020-05-31  | abc|1  |
|2020-05-31    |2020-06-30  | abc|1  |
|2020-06-30    |2020-07-31  | abc|1  |
|2020-07-31    |2020-08-31  | abc|1  |
|2020-08-31    |2020-09-05  | abc|1  |
|2020-09-05    |2020-10-05  | abc|1  |
|2020-01-31    |2020-02-29  | def|2  |
|2020-02-29    |2020-03-31  | def|2  |
|2020-03-31    |2020-04-30  | def|2  |
+--------------+------------+----+---+

【问题讨论】:

标签: python pandas python-2.7 dataframe numpy


【解决方案1】:

好吧,这是有道理的。这段代码给了你预期的结果

import pandas as pd
from pandas.tseries.offsets import DateOffset 
from dateutil.relativedelta import *
import datetime 

df1 = pd.DataFrame(
   {
        "interval_start": ["2018-10-31", "2020-09-05", "2020-01-31"],
        "interval_end": ["2020-09-05", "2020-10-05", "2020-04-30"],
        "name": ["abc", "abc", "def"],
        "val": ["1", "1", "2"],
    },
index=[0, 1, 2],
)


df2 = pd.DataFrame()
start_list = []
end_list = []
name_list = []
val_list = []
for i in range(3):
    interval = pd.period_range(df1["interval_start"][i], df1["interval_end"][i], freq="M")
    date_plus1 = datetime.datetime.strptime((df1["interval_start"][i] ), '%Y-%m-%d' ) + relativedelta(months=+1)
    
    for k in range(len(interval)):
        start_list.append(str(pd.period_range(df1["interval_start"][i], df1["interval_end"][i], freq="M")[k]) + str(-30))
        end_list.append(str(date_plus1)[0:10])
        name_list.append(df1["name"][i])
        val_list.append(df1["val"][i])
        
        date_plus1 = date_plus1 + relativedelta(months=+1)

df2["interval_start"] = start_list
df2["interval_end"] = end_list
df2["name"] = name_list
df2["val"] = val_list

print(df2)

如果您想要每个月的确切最后日期,您只需要稍微调整一下日期即可。

【讨论】:

    猜你喜欢
    • 2019-01-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-11-30
    • 1970-01-01
    • 2019-06-08
    相关资源
    最近更新 更多