【问题标题】:How to calculate number of dates within a year of a date in pandas如何计算熊猫日期一年内的日期数
【发布时间】:2021-07-26 17:17:30
【问题描述】:

我有以下数据框,我需要计算给定主题的 pheneDate 一年后的 ER 访问日期数量,得分为 1。所以基本上 phenevisit 003v1 在 2005 年 11 月 23 日的一年内有 2 个日期,得分为 1,分别是 5/5/06 和 8/5/06,因此其他 phenevisit 的得分为 2,依此类推。

PheneVisit  PheneDate   Score   ER Date    SubjectID
                N/A     0       10/25/05   phchp003
phchp003v1  11/23/05    0                  phchp003
                N/A     1       5/5/06     phchp003
phchp003v2  5/10/06     0                  phchp003
                N/A     0       6/22/06    phchp003
                N/A     1       8/5/06     phchp003
phchp003v4  2/7/14      0                  phchp003
                N/A     1       10/13/14   phchp003
                N/A     0       2/15/15    phchp003
                N/A     1       8/14/15    phchp003
phchp004v2  4/27/12     0                  phchp004
phchp004v3  8/15/12     0                  phchp004
                N/A     1       5/18/13    phchp004
                N/A     0       6/21/13    phchp004
phchp004v4  6/3/15      0                  phchp004
                N/A     0       8/27/15    phchp004
                N/A     1       9/3/15     phchp004
                N/A     1       8/22/16    phchp004
                N/A     1       11/19/16   phchp004
phchp005v1  2/8/06      0                  phchp005
                N/A     1       3/24/06    phchp005
                N/A     1       4/16/06    phchp005
                N/A     1       4/25/06    phchp005
                N/A     1       5/18/06    phchp005
                N/A     0       5/25/06    phchp005
                N/A     0       6/2/06     phchp005

我希望为每个主题中的给定 phenevisits 获取此列:

PheneVisit  First Year Hosp
            0
phchp003v1  2
            0
phchp003v2  2
            0
            0
phchp003v4  1
            0
            0
            0
phchp004v2  0
phchp004v3  2
            0
            0
phchp004v4  2
            0
            0
            0
            0
phchp005v1  4

如果有什么我可以澄清的,请告诉我,谢谢。

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    试试这个:

    import pandas as pd
    import numpy as np
    from io import StringIO
    
    inputtxt = StringIO("""
    PheneVisit  PheneDate   Score   ER Date    SubjectID
    N/A             N/A     0       10/25/05   phchp003
    phchp003v1  11/23/05    0       N/A         phchp003
    N/A             N/A     1       5/5/06     phchp003
    phchp003v2  5/10/06     0       N/A        phchp003
    N/A             N/A     0       6/22/06    phchp003
    N/A             N/A     1       8/5/06     phchp003
    phchp003v4  2/7/14      0       N/A        phchp003
    N/A             N/A     1       10/13/14   phchp003
    N/A             N/A     0       2/15/15    phchp003
    N/A             N/A     1       8/14/15    phchp003
    phchp004v2  4/27/12     0       N/A        phchp004
    phchp004v3  8/15/12     0       N/A        phchp004
    N/A             N/A     1       5/18/13    phchp004
    N/A             N/A     0       6/21/13    phchp004
    phchp004v4  6/3/15      0       N/A        phchp004
    N/A             N/A     0       8/27/15    phchp004
    N/A             N/A     1       9/3/15     phchp004
    N/A             N/A     1       8/22/16    phchp004
    N/A             N/A     1       11/19/16   phchp004
    phchp005v1  2/8/06      0       N/A        phchp005
    N/A             N/A     1       3/24/06    phchp005
    N/A             N/A     1       4/16/06    phchp005
    N/A             N/A     1       4/25/06    phchp005
    N/A             N/A     1       5/18/06    phchp005
    N/A             N/A     0       5/25/06    phchp005
    N/A             N/A     0       6/2/06     phchp005
    """)
    
    df = pd.read_csv(inputtxt, sep='\s\s+', engine='python')
    
    df['PheneDate'] = pd.to_datetime(df['PheneDate'], format='%m/%d/%y')
    
    df['ER Date'] = pd.to_datetime(df['ER Date'], format='%m/%d/%y')
    
    df['pi'] = pd.IntervalIndex.from_arrays(df['PheneDate'], df['PheneDate'] + pd.DateOffset(years=1))
    df
    def f(x):
        x = x.set_index('pi')
        x['Number of First Year'] = np.sum(np.vstack([x.index.contains(i) for i in x.loc[x['Score'] == 1, 'ER Date']]), 0)
        return x.reset_index(drop=True)
    
    df.groupby('SubjectID').apply(f).groupby('PheneVisit')['Number of First Year'].transform('sum')
    

    输出:

    SubjectID   
    phchp003   0    NaN
               1    2.0
               2    NaN
               3    1.0
               4    NaN
               5    NaN
               6    1.0
               7    NaN
               8    NaN
               9    NaN
    phchp004   0    0.0
               1    1.0
               2    NaN
               3    NaN
               4    1.0
               5    NaN
               6    NaN
               7    NaN
               8    NaN
    phchp005   0    4.0
               1    NaN
               2    NaN
               3    NaN
               4    NaN
               5    NaN
               6    NaN
    Name: Number of First Year, dtype: float64
    

    【讨论】:

    • 我在尝试运行时遇到此错误:ValueError: need at least one array to concatenate, tracing back to this: df.groupby('SubjectID').apply(f).groupby('PheneVisit')['Number of First Year'].transform('sum')
    • hrm.. 似乎您可能有一些丢失的数据或没有匹配的数据。看看你是否可以创建一个重复的数据集。使用我的代码来测试和创建问题。
    猜你喜欢
    • 1970-01-01
    • 2016-12-16
    • 2018-11-21
    • 2023-03-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多