我做了这样的东西 - 也许它对你有帮助(或没有)。
import pandas as pd
df = pd.DataFrame( [
['2011-01-01 01:00', 1, 2, 3],
['2011-01-01 02:00', 10, 20, 30],
['2011-01-01 03:00', 100, 200, 300],
['2011-01-02 01:00', 4, 5, 6],
['2011-01-02 02:00', 40, 50, 60],
['2011-01-02 03:00', 400, 500, 600],
], columns=['datetime','a','b','c'])
# convert string datetime to object datetime
df['datetime'] = pd.to_datetime(df['datetime'])
# now I have example dataframe for work
# create row with date only
df['date'] = df['datetime'].apply(lambda t: t.date())
# groupe by date
g = df.groupby('date').mean()
# change `date` from index to normal column
g2 = g.reset_index()
# merge by `date` columns
new_df = pd.merge(left=df, right=g2, on='date', suffixes=('_df','_group') )
print df
print g
print g2
print new_df
df:
datetime a b c date
0 2011-01-01 01:00:00 1 2 3 2011-01-01
1 2011-01-01 02:00:00 10 20 30 2011-01-01
2 2011-01-01 03:00:00 100 200 300 2011-01-01
3 2011-01-02 01:00:00 4 5 6 2011-01-02
4 2011-01-02 02:00:00 40 50 60 2011-01-02
5 2011-01-02 03:00:00 400 500 600 2011-01-02
g:
a b c
date
2011-01-01 37 74 111
2011-01-02 148 185 222
g2:
date a b c
0 2011-01-01 37 74 111
1 2011-01-02 148 185 222
new_df:
datetime a_df b_df c_df date a_group b_group c_group
0 2011-01-01 01:00:00 1 2 3 2011-01-01 37 74 111
1 2011-01-01 02:00:00 10 20 30 2011-01-01 37 74 111
2 2011-01-01 03:00:00 100 200 300 2011-01-01 37 74 111
3 2011-01-02 01:00:00 4 5 6 2011-01-02 148 185 222
4 2011-01-02 02:00:00 40 50 60 2011-01-02 148 185 222
5 2011-01-02 03:00:00 400 500 600 2011-01-02 148 185 222
编辑:
使用left_on='date', right_index=True就不需要使用reset_index()
# change `date` from index to normal column
#g2 = g.reset_index()
# merge by `date` columns
#new_df = pd.merge(left=df, right=g2, on='date', suffixes=('_df','_group') )
new_df = pd.merge(left=df, right=g2, left_on='date', right_index=True, suffixes=('_df','_group') )
打印 df