【发布时间】:2021-10-31 07:20:47
【问题描述】:
我有以下代码:
datetime_const = datetime(2021, 3, 31)
tmp_df1['datetime2'] = pd.to_datetime(tmp_df1['datetime1'], format='%Y-%m-%d')
tmp_df1['test_col_1'] = (tmp_df1['value1'] < 0.0002) & (tmp_df1['datetime2'] < (datetime_const + pd.DateOffset(months=12)))
tmp_df1['test_col_2'] = (tmp_df1['value1'] >= 0.0002) & ((((tmp_df1['datetime2'] - datetime_const ).dt.days/365)*tmp_df1['value1']) < 0.0002)
tmp_df1['test_col_3'] = datetime_const + pd.DateOffset(months=12)
tmp_df1['test_col_4'] = datetime_const + pd.to_timedelta(((0.0002/tmp_df1['value1'])*365).round(), unit='D')
tmp_df1['test_col_5'] = tmp_df1['datetime2']
tmp_df1['datetime3'] = np.select(
[
(tmp_df1['value1'] < 0.0002) & (tmp_df1['datetime2'] < (datetime_const + pd.DateOffset(months=12))),
(tmp_df1['value1'] >= 0.0002) & ((((tmp_df1['datetime2'] - datetime_const ).dt.days/365)*tmp_df1['value1']) < 0.0002)
],
[
datetime_const + pd.DateOffset(months=12),
datetime_const + pd.to_timedelta(((0.0002/tmp_df1['value1'])*365).round(), unit='D')
],
default=tmp_df1['datetime2']
)
datetime1 是一个对象 dtype,所以我将它转换为 datetime64,因为 datetime2 被分配为。
value1 是一个带有一堆十进制数字的浮点 dtype 列,它确实有 NaN。
我创建了 test_col_1 到 test_col_5 来检查我的 np.select 函数中的各个条件和选择,当分配为单独的 df 列时,它们似乎都是正确的。
但是,我的 datetime3 列分配,来自 np.select 函数,返回一些奇怪的对象 dtype 大数字,如 160000000000。我希望它从两个选项之一返回一个 datetime64 值,或者返回默认的 datetime2 列价值。
请查看下面的示例 .info 和 df 行:
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 datetime2 26558 non-null datetime64[ns]
1 value1 25438 non-null float64
2 test_col_1 26558 non-null bool
3 test_col_2 26558 non-null bool
4 test_col_3 26558 non-null datetime64[ns]
5 test_col_4 25438 non-null datetime64[ns]
6 test_col_5 26558 non-null datetime64[ns]
7 datetime3 26558 non-null object
dtypes: bool(2), datetime64[ns](4), float64(1), object(1)
memory usage: 1.5+ MB
datetime2 value1 test_col_1 test_col_2 test_col_3 test_col_4 test_col_5 datetime3
0 2021-06-30 0.00058 False True 2022-03-31 2021-08-05 2021-06-30 1628121600000000000
1 2022-03-31 0.00044 False False 2022-03-31 2021-09-13 2022-03-31 1648684800000000000
2 2024-06-07 0.00860 False False 2022-03-31 2021-04-08 2024-06-07 1717718400000000000
3 2021-09-30 0.00867 False False 2022-03-31 2021-04-08 2021-09-30 1632960000000000000
4 2021-08-31 0.00144 False False 2022-03-31 2021-05-21 2021-08-31 1630368000000000000
5 2021-08-31 0.00144 False False 2022-03-31 2021-05-21 2021-08-31 1630368000000000000
6 2021-04-08 0.00474 False True 2022-03-31 2021-04-15 2021-04-08 1618444800000000000
7 2023-10-01 0.11506 False False 2022-03-31 2021-04-01 2023-10-01 1696118400000000000
8 2023-09-29 0.12067 False False 2022-03-31 2021-04-01 2023-09-29 1695945600000000000
9 2021-05-31 0.02508 False False 2022-03-31 2021-04-03 2021-05-31 1622419200000000000
我完全被这种行为弄糊涂了,请赐教!
提前谢谢大家!
【问题讨论】:
-
@Ben.T 好点,我添加了一些我所看到的示例。谢谢。
标签: python pandas dataframe numpy conditional-statements