【问题标题】:Pandas extract the first year from dataframe熊猫从数据框中提取第一年
【发布时间】:2023-04-02 03:16:02
【问题描述】:

我有这个大数据框,当某些资源第一次可用时我需要。让我从我的代码中解释一下。

df1 = df[df['Resource_ID'] == 1348]
df1 = df1[['Format', 'Range_Start', 'Number']]
df1["Range_Start"] = df1["Range_Start"].str[:7]
df1 = df1.groupby(['Format', 'Range_Start'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()

df2 = df1[1:4].sum(axis=0)

df2.name = 'sum'
df2 = df1.append(df2)

df3 = df2.T[['entry', 'sum']].copy()
df3.index = pd.to_datetime(df3.index)

现在 print(df3.first('1D')) 给出以下输出:

Format     entry  sum
Range_Start            
2011-07-01      97   72

我现在可以看到 Resource_ID 1348 首次出现在 2011-07-01,如何从该信息中仅提取年份?

这是我的示例输入 csv 数据:

Access_Stat_ID,Resource_ID,Range_Start,Range_End,Name,Format,Number,Matched_URL
1,15,"2009-03-01 00:00:00","2009-03-31 23:59:59","Mar 2009","entry",3,""
203,13,"2009-04-01 00:00:00","2009-04-30 23:59:59","Apr 2009","entry",18,""
204,13,"2009-04-01 00:00:00","2009-04-30 23:59:59","Apr 2009","pdf",7,""

【问题讨论】:

标签: python pandas datetime dataframe


【解决方案1】:

看来需要:

first_year = df3.index.year[0]

【讨论】:

    猜你喜欢
    • 2019-03-24
    • 1970-01-01
    • 2018-02-28
    • 2017-06-14
    • 2021-05-25
    • 2017-07-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多