如何通过迭代多个值动态创建新数据框？答案

【问题标题】：How do I dynamically create a new dataframe from iterating over multiple values?如何通过迭代多个值动态创建新数据框？
【发布时间】：2019-01-01 13:33:03
【问题描述】：

python 新手。

我有这个数据：

sample = pd.DataFrame({'CustomerID': ['1', '2', '3', '4', '5', '6'],
       'Date': np.random.choice(pd.Series(pd.date_range('2018-01-01', 
        freq='D', periods=180)), 6),
       'Period': np.random.uniform(50, 200, 6),
      }, columns=['CustomerID', 'Date', 'Period'])
sample

我想将'Period' 列添加到'Date' 列中，将每个新日期记录在一个单独的数据框中，其中包含CustomerID 和New Date 列。但是，我想记录每个新日期（迭代上一个新日期）直到新日期> 2020。

我做了一个函数：

def proj(ids=None):
end = pd.to_datetime('2020-01-01') 
for x in ids:
    date = projection.loc[projection['CustomerID'] == x, 'Date'] 
    period = projection.loc[projection['CustomerID'] == x, 'Period'])
    time_left = end - date  
    ratio = float(round(time_left.dt.days / period)) # how many times the period fits in time_left
    itera = np.arange(1, ratio, 1) 
    for i in itera:
        deltas = [i * period]
        df = pd.Series(deltas).map(float).map(dt.timedelta) 
        pdates = pd.Series((date + df)) 
        pdates = pdates.map(pd.to_datetime)
        print(dates)

我显然不仅没有弄清楚如何为我的输出创建一个新的数据框，而且这个函数也只适用于我的一个 CustomerID，而不能用于其他的。

我真的很想知道接下来我能做什么。

感谢您的帮助。

编辑：作为参考，我希望输出看起来像

output = pd.DataFrame({'CustomerID': ['1', '1', '1', '1', '2', '2', '2'],
                  'New Date': ['2018-09-28', '2019-01-21', '2019-05-16','2019-09-08',
                              '2018-09-26', '2019-02-27', '2019-07-31']})
output

【问题讨论】：

你能发布想要的输出吗？
output = pd.DataFrame({'CustomerID': ['1', '1', '1', '1', '2', '2', '2'], 'New Date': ['2018-09-28', '2019-01-21', '2019-05-16','2019-09-08', '2018-09-26', '2019-02-27', '2019-07-31']}) output
您关心 Period 的小数部分，还是按四舍五入的天数移动就足够了？两者都是可能的，但圆形更干净一些。
理想情况下我们会尽可能准确，但实际上这很好。
但在您的输出示例中没有日期 > 2020...??

标签： python pandas numpy datetime for-loop

【解决方案1】：

sample 如下：

  CustomerID       Date  Period
0          1 2018-01-16     152
1          2 2018-06-28     109
2          3 2018-03-07      59
3          4 2018-03-30     172
4          5 2018-01-07      92
5          6 2018-05-22     164

首先，让我们指定将Date 转换为datetime 对象的结束日期。

from datetime import timedelta
from datetime import datetime
end_date = datetime.strptime('2020-01-01', '%Y-%m-%d')
sample['Date'] = pd.to_datetime(sample['Date'])

现在，让我们为每一行创建一个日期列表。

sample['dates'] = sample.apply(lambda x: pd.date_range(start=x['Date'], end=end_date, freq='D')[::x['Period']], axis=1)

简单地展平日期，保留CustomerID

output = sample[['CustomerID', 'dates']].set_index('CustomerID')['dates'].apply(pd.Series).stack().reset_index(name='New Date').drop('level_1',1)

输出：

   CustomerID   New Date
0           1 2018-01-16
1           1 2018-06-17
2           1 2018-11-16
3           1 2019-04-17
4           1 2019-09-16
5           2 2018-06-28
6           2 2018-10-15
7           2 2019-02-01
8           2 2019-05-21
9           2 2019-09-07
10          2 2019-12-25
11          3 2018-03-07
12          3 2018-05-05
13          3 2018-07-03
14          3 2018-08-31
15          3 2018-10-29
16          3 2018-12-27
17          3 2019-02-24
18          3 2019-04-24
19          3 2019-06-22
20          3 2019-08-20
21          3 2019-10-18
22          3 2019-12-16
23          4 2018-03-30
24          4 2018-09-18
25          4 2019-03-09
26          4 2019-08-28
27          5 2018-01-07
28          5 2018-04-09
29          5 2018-07-10
30          5 2018-10-10
31          5 2019-01-10
32          5 2019-04-12
33          5 2019-07-13
34          5 2019-10-13
35          6 2018-05-22
36          6 2018-11-02
37          6 2019-04-15
38          6 2019-09-26

【讨论】：

只输出周期+日期的第一次迭代。我需要每次迭代，直到新日期在 2020 年之后。
动态创建列表并从列表中创建数据框通常会更好。 pandas 不能很好地处理你想做的事情。为什么首先要使用数据框？
@Chad，查看编辑。希望这能回答您的问题。
@HarvIpan 看起来很棒，除了我在sample['dates'] 行上得到ValueError: ('n argument must be an integer, got 84.97301157674099', 'occurred at index 0')
@Chad，您必须将您的期间转换为int，请参阅我的输入数据框。这是因为 pandas.date_range 期望 periods 是 int