IIUC你可以先转换列to_datetime,使用abs再转换timedelta为days:
print df
id value date1 date2 sum
0 A 150 2014-04-08 2014-03-08 NaN
1 B 100 2014-05-08 2014-02-08 NaN
2 B 200 2014-01-08 2014-07-08 100
3 A 200 2014-04-08 2014-03-08 NaN
4 A 300 2014-06-08 2014-04-08 350
df['date1'] = pd.to_datetime(df['date1'])
df['date2'] = pd.to_datetime(df['date2'])
df['diff'] = (df['date1'] - df['date2']).abs() / np.timedelta64(1, 'D')
print df
id value date1 date2 sum diff
0 A 150 2014-04-08 2014-03-08 NaN 31
1 B 100 2014-05-08 2014-02-08 NaN 89
2 B 200 2014-01-08 2014-07-08 100 181
3 A 200 2014-04-08 2014-03-08 NaN 31
4 A 300 2014-06-08 2014-04-08 350 61
编辑:
我认为将np.timedelta64(1, 'D') 转换为更大的DataFrames 中的days 更好,因为它更快:
我使用 EdChum sample,只使用 len(df) = 4k:
import io
import pandas as pd
import numpy as np
t=u"""Test Date,Test Type,First Use Date
2011-02-05,A,2010-01-05
2012-02-05,A,2010-03-05
2013-02-05,A,2010-06-05
2014-02-05,A,2010-08-05"""
df = pd.read_csv(io.StringIO(t))
df = pd.concat([df]*1000).reset_index(drop=True)
df['Test Date'] = pd.to_datetime(df['Test Date'])
df['First Use Date'] = pd.to_datetime(df['First Use Date'])
print (df['Test Date'] - df['First Use Date']).abs().dt.days
print (df['Test Date'] - df['First Use Date']).abs() / np.timedelta64(1, 'D')
时间安排:
In [174]: %timeit (df['Test Date'] - df['First Use Date']).abs().dt.days
10 loops, best of 3: 38.8 ms per loop
In [175]: %timeit (df['Test Date'] - df['First Use Date']).abs() / np.timedelta64(1, 'D')
1000 loops, best of 3: 1.62 ms per loop