as.POSIXct 行为不一致答案

【问题标题】：as.POSIXct behaving inconsistentlyas.POSIXct 行为不一致
【发布时间】：2021-07-09 22:17:12
【问题描述】：

这听起来像是一个重复的问题，但我经历了许多与 POSIxct 相关的错误，但没有遇到过这个问题。如果您仍然找到一个，我将非常感谢您指出这个方向。 as.POSIXct 在我的情况下表现得非常尴尬。请看下面的例子：

options(digits.secs = 3)
test_time <- "2017-01-26 23:00:00.010"
test_time <- as.POSIXct(test_time, format = "%Y-%m-%d %H:%M:%OS")

这会返回：

"2017-01-26 23:00:00.00"

现在，我尝试以下选项并返回 NA。我不知道为什么当我需要它转换为“2017-01-26 23:00:00.010”时会出现这种行为。

test_time <- "2017-01-26 23:00:00.010"
test_time <- as.POSIXct(test_time, format = "%Y-%m-%d %H:%M:%OS3")

现在我这样做时效果很好：

as.POSIXlt(strptime(test_time,format = "%Y-%m-%d %H:%M:%OS"), format = "%Y-%m-%d %H:%M:%OS")

但出于我的目的，我需要将其作为 POSIXct 对象，因为我正在使用的某些库仅使用 POSIXct 对象。再次将 POSIXlt 转换为 POSIXct 会导致与以前相同的问题。我的系统设置有问题吗？该日期也不是那些引发错误的夏令时之一。为什么它适用于一种格式而不适用于其他格式？欢迎任何线索/建议！

在 64 位 Windows 10 上运行

【问题讨论】：

正如帮助页面所说，它是特定于操作系统的，也许您的操作系统不支持亚秒级精度。（我的没有。）
这可能是真的。但是为什么 00.020 会变成 00.010？
如果您发布了这样的代码，那么我们很乐意阅读它。您还应该发布您的操作系统详细信息，因为这是一个关于被记录为操作系统特定的方面的问题。
%OSn 仅用于格式化输出。它不适用于解析输入。额外的数据在那里，但默认情况下只是四舍五入。由于浮点数学，该值无法准确表示：strftime(as.POSIXct(test_time, format = "%Y-%m-%d %H:%M:%OS"), "%Y-%m-%d %H:%M:%OS3")
您是否尝试过在您正在使用的函数/库中使用test_time (test_time <- as.POSIXct(test_time, format = "%Y-%m-%d %H:%M:%OS"))？

标签： r datetime posixct

【解决方案1】：

这里的问题与 POSIXct 可以处理的最大精度有关。它由引擎盖下的 double 支持，表示自 1970 年 1 月 1 日 UTC 午夜以来的秒数。小数秒表示为该双精度的小数部分，即63.02 代表1970-01-01 00:01:03.02 UTC。

options(digits = 22, digits.secs = 3)

.POSIXct(63.02, tz = "UTC")
#> [1] "1970-01-01 00:01:03.02 UTC"

63.02
#> [1] 63.02000000000000312639

现在，当使用双精度时，它们可以精确表示的精度是有限的。您可以通过上面的示例看到这一点；在控制台中输入 63.02 并不会返回完全相同的数字，而是返回接近的数字，但末尾有一些额外的位。

现在让我们看一下您的示例。如果我们尽可能从“低级别”开始，as.POSIXct() 所做的第一件事就是调用strptime()，它返回一个 POSIXlt 对象。这将日期时间的每个“字段”保持为一个单独的元素（即年与月、日、秒等分开）。我们可以看到它解析正确并且我们的 sec 字段包含0.01。

# `digits.secs` to print 3 fractional digits (has no effect on parsing)
# `digits` to print 22 fractional digits for double values
options(digits.secs = 3, digits = 22)

x <- "2017-01-26 23:00:00.010"

# looks good
lt <- strptime(x, format = "%Y-%m-%d %H:%M:%OS", tz = "America/New_York")
lt
#> [1] "2017-01-26 23:00:00.01 EST"

# This is a POSIXlt, which is a list holding fields like year,month,day,...
class(lt)
#> [1] "POSIXlt" "POSIXt"

# sure enough...
lt$sec
#> [1] 0.01000000000000000020817

但现在将其转换为 POSIXct。此时，各个字段被折叠成单个双精度字段，这可能存在精度问题。

# now convert to POSIXct (i.e. a single double holding all the info)
# looks like we lost the fractional seconds?
ct <- as.POSIXct(lt)
ct
#> [1] "2017-01-26 23:00:00.00 EST"

# no, they are still there, but the precision in the `double` data type
# isn't enough to be able to represent this exactly as `1485489600.010`
unclass(ct)
#> [1] 1485489600.009999990463
#> attr(,"tzone")
#> [1] "America/New_York"

所以双精度值的ct 小数部分接近.010，但不能准确表示它并返回一个比.010 略小于的值，它得到（I presume) 在打印 POSIXct 时四舍五入，使您看起来好像丢失了小数秒。

因为这些问题比较麻烦，我推荐使用clock包的low level API（注意是我写的这个包）。它支持小数秒到纳秒而不损失精度（通过使用与 POSIXct 不同的数据结构）。 https://clock.r-lib.org/

library(clock)

x <- "2017-01-26 23:00:00.010"

nt <- naive_time_parse(x, format = "%Y-%m-%d %H:%M:%S", precision = "millisecond")
nt
#> <time_point<naive><millisecond>[1]>
#> [1] "2017-01-26 23:00:00.010"

# If you need it in a time zone
as_zoned_time(nt, zone = "America/New_York")
#> <zoned_time<millisecond><America/New_York>[1]>
#> [1] "2017-01-26 23:00:00.010-05:00"

【讨论】：