（熊猫）组合两个数据帧的不同方式答案

【问题标题】：(Pandas) Different way of combining two dataframes（熊猫）组合两个数据帧的不同方式
【发布时间】：2018-09-04 05:02:47
【问题描述】：

如果有比我在下面所做的更好的组合两个数据帧的方法，我正在徘徊。

import pandas as pd

#create ramdom data sets
N = 50
df = pd.DataFrame({'date': pd.date_range('2000-1-1', periods=N, freq='H'),
 'value': np.random.random(N)})

index = pd.DatetimeIndex(df['date'])
peak_time = df.iloc[index.indexer_between_time('7:00','9:00')]
lunch_time = df.iloc[index.indexer_between_time('12:00','14:00')]

comb_data = pd.concat([peak_time, lunch_time], ignore_index=True)

在使用 between_time 和逻辑运算符时，有没有办法组合两个范围？

我必须用它在 df 中创建一个名为“isPeak”的新列，其中 1 在 7:00 ~ 9:00 和 12:00 ~ 14:00 之间写入，否则写入 0。

【问题讨论】：

标签： python pandas datetime dataframe python-datetime

【解决方案1】：

为我工作np.union1d:

import numpy as np

idx = np.union1d(index.indexer_between_time('7:00','9:00'), 
                 index.indexer_between_time('12:00','14:00'))

comb_data = df.iloc[idx]
print (comb_data)
                  date     value
7  2000-01-01 07:00:00  0.760627
8  2000-01-01 08:00:00  0.236474
9  2000-01-01 09:00:00  0.626146
12 2000-01-01 12:00:00  0.625335
13 2000-01-01 13:00:00  0.793105
14 2000-01-01 14:00:00  0.706873
31 2000-01-02 07:00:00  0.113688
32 2000-01-02 08:00:00  0.035565
33 2000-01-02 09:00:00  0.230603
36 2000-01-02 12:00:00  0.423155
37 2000-01-02 13:00:00  0.947584
38 2000-01-02 14:00:00  0.226181

替代numpy.r_:

idx = np.r_[index.indexer_between_time('7:00','9:00'), 
            index.indexer_between_time('12:00','14:00')]

comb_data = df.iloc[idx]
print (comb_data)
                  date     value
7  2000-01-01 07:00:00  0.760627
8  2000-01-01 08:00:00  0.236474
9  2000-01-01 09:00:00  0.626146
31 2000-01-02 07:00:00  0.113688
32 2000-01-02 08:00:00  0.035565
33 2000-01-02 09:00:00  0.230603
12 2000-01-01 12:00:00  0.625335
13 2000-01-01 13:00:00  0.793105
14 2000-01-01 14:00:00  0.706873
36 2000-01-02 12:00:00  0.423155
37 2000-01-02 13:00:00  0.947584
38 2000-01-02 14:00:00  0.226181

带有Index.union 的纯pandas 解决方案并将数组转换为index：

idx = (pd.Index(index.indexer_between_time('7:00','9:00'))
         .union(pd.Index(index.indexer_between_time('12:00','14:00'))))

comb_data = df.iloc[idx]
print (comb_data)
                  date     value
7  2000-01-01 07:00:00  0.760627
8  2000-01-01 08:00:00  0.236474
9  2000-01-01 09:00:00  0.626146
12 2000-01-01 12:00:00  0.625335
13 2000-01-01 13:00:00  0.793105
14 2000-01-01 14:00:00  0.706873
31 2000-01-02 07:00:00  0.113688
32 2000-01-02 08:00:00  0.035565
33 2000-01-02 09:00:00  0.230603
36 2000-01-02 12:00:00  0.423155
37 2000-01-02 13:00:00  0.947584
38 2000-01-02 14:00:00  0.226181

【讨论】：

如果我们不能使用numpy，我上面做的应该没问题吧？
如果使用 pandas，请将 np.union1d 更改为 pd.np.union1d，如果 np 有问题。因为 pandas 是建立在 numpy 上的 :) 对于您的问题 - 是的，这是正确的解决方案。
@SeoiMin - 添加了纯熊猫版本。