【问题标题】:Duplicate rows with some changes in Python在 Python 中进行一些更改的重复行
【发布时间】:2020-06-13 03:32:19
【问题描述】:

我遇到了如何在 Python 中使用循环函数复制行的问题。我有一个这样的数据集(这是一个熊猫数据框):

    userId  period  Date    
0   41851   4       1/4/2015    
1   13575   1       1/4/2015    

我想复制第一行 3 次,每次原始行,周期列需要负 1。直到原始的周期为 1。而且每次复制时,我都想在日期上加 1 个月。所以结果会是这样的:

    userId  period  Date    
0   41851   1       1/4/2015
1   41851   1       2/4/2015
2   41851   1       3/4/2015
3   41851   1       4/4/2015    
4   13575   1       1/4/2015    

有人知道怎么做吗?谢谢!

【问题讨论】:

    标签: python pandas loops datetime


    【解决方案1】:

    想法是通过Index.repeatDataFrame.loc 重复行,然后通过GroupBy.cumcountthis solution 添加天数,最后在必要时通过Series.dt.strftime 更改日期时间格式:

    def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
                  seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
        years = np.asarray(years) - 1970
        months = np.asarray(months) - 1
        days = np.asarray(days) - 1
        types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
                 '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
        vals = (years, months, days, weeks, hours, minutes, seconds,
                milliseconds, microseconds, nanoseconds)
        return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
                   if v is not None)
    
    def year(dates):
        "Return an array of the years given an array of datetime64s"
        return dates.astype('M8[Y]').astype('i8') + 1970
    
    def month(dates):
        "Return an array of the months given an array of datetime64s"
        return dates.astype('M8[M]').astype('i8') % 12 + 1
    
    def day(dates):
        "Return an array of the days of the month given an array of datetime64s"
        return (dates - dates.astype('M8[M]')) / np.timedelta64(1, 'D') + 1
    

    df['Date'] = pd.to_datetime(df['Date'])
    
    df1 = df.loc[df.index.repeat(df['period'])]
    g = df1.groupby(level=0).cumcount()
    
    start = df1['Date'].values
    df1['Date'] = combine64(year(start), months=month(start) + g, 
                               days=day(start))
    df1['period'] = 1
    df1 = df1.reset_index(drop=True)
    
    df1['Date'] = df1['Date'].dt.strftime('%m/%d/%Y')
    print (df1)
       userId  period        Date
    0   41851       1  01/04/2015
    1   41851       1  02/04/2015
    2   41851       1  03/04/2015
    3   41851       1  04/04/2015
    4   13575       1  01/04/2015
    

    【讨论】:

    • 成功了!!太感谢了!这真是令人印象深刻。
    • @Stella - 超级棒!如果我的回答有帮助,请不要忘记accept。谢谢。
    猜你喜欢
    • 1970-01-01
    • 2013-12-20
    • 1970-01-01
    • 2022-01-10
    • 1970-01-01
    • 2021-04-05
    • 2020-03-04
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多