循环遍历日期时间的数据框答案

【问题标题】：Looping through a data frame of datetimes循环遍历日期时间的数据框
【发布时间】：2018-09-03 20:02:37
【问题描述】：

我正在尝试为用于跟踪我正在研究的鸟类迁徙的卫星发射器创建 GPS 时间表。下面称为“sched_gps_fixes”的函数接受一个日期时间向量并将它们写入一个.ASF 文件，该文件被上传到卫星发射器。这告诉发射机进行 GPS 定位的日期和时间。使用 R 和 sched_gps_fixes 函数，我可以快速创建从一年中的任何一天开始的 GPS 时间表。发射器附带的软件也可以做到这一点，但我必须煞费苦心地选择我希望发射器获取 GPS 位置的每个时间和日期。

所以我想：1）创建一个数据框，其中包含 2018 年的每一天，以及我希望发射器收集 GPS 位置的时间，2）使用数据框的每一行作为开始日期对于一系列日期时间（例如，从 2018 年 3 月 25 日 12:00:00 开始，我想创建一个 GPS 时间表，在此之后每隔一天获取一个 GPS 点，所以 2018 年 3 月 25 日 12:00： 00, 2018-03-27 12:00:00 等），以及 3) 为每个 GPS 时间表创建一个 .ASF 文件。这是我在下面尝试完成的简化版本：

library(lubridate)

# set the beginning time
start_date <- ymd_hms('2018-01-01 12:00:00')

# create a sequence of datetimes starting January 1
days_df <- seq(ymd_hms(start_date), ymd_hms(start_date+days(10)), by='1 days')
tz(days_df) <- "America/Chicago"
days_df <- as.data.frame(days_df)
days_df

# to reproduce the example
days_df <- structure(list(days_df = structure(c(1514829600, 1514916000, 
1515002400, 1515088800, 1515175200, 1515261600, 1515348000, 1515434400, 
1515520800, 1515607200, 1515693600), class = c("POSIXct", "POSIXt"
), tzone = "America/Chicago")), .Names = "days_df", row.names = c(NA, 
-11L), class = "data.frame")

# the data frame looks like this:

days_df
1  2018-01-01 12:00:00
2  2018-01-02 12:00:00
3  2018-01-03 12:00:00
4  2018-01-04 12:00:00
5  2018-01-05 12:00:00
6  2018-01-06 12:00:00
7  2018-01-07 12:00:00
8  2018-01-08 12:00:00
9  2018-01-09 12:00:00
10 2018-01-10 12:00:00
11 2018-01-11 12:00:00

我想遍历数据框中的每个日期时间，并为数据框的每一行创建一个向量。所以每个向量都会有一个特定行的日期时间作为 GPS 时间表的开始日期，这将每 2 天取一个点（类似这样）：

[1] "2018-01-01 12:00:00 UTC" "2018-01-03 12:00:00 UTC" "2018-01-05 12:00:00 UTC" "2018-01-07 12:00:00 UTC"
[5] "2018-01-09 12:00:00 UTC" "2018-01-11 12:00:00 UTC"

然后每个向量（或 GPS 时间表）将在以下函数中作为“gps_schedule”运行，以为发射器创建一个 .ASF 文件：

sched_gps_fixes(gps_schedule, tz = "America/Chicago", out_file = "./gps_fixes")

所以我想知道如何创建一个 for 循环，该循环将为 2018 年的每一天生成一个日期时间向量。这是我正在尝试做的伪代码：

# create a loop called 'create_schedules' to make the GPS schedules and produce a .ASF file for each day of 2018

create_schedules <- function(days_df) {

  for(row in 1:nrow(days_df)) {

    seq(ymd_hms(days_df[[i]]), ymd_hms(days_df[[i]]+days(10)), by='2 days')

  }
}

# run the function
create_schedules(days_df)

我猜我需要一个输出来按开始日期存储和命名每个向量？

谢谢，

杰

【问题讨论】：

感谢您添加data.frame 的详细信息。我不确定你最后期望什么输出？主要是create a vector for each row of the data frame 比较混乱。你能解释一下吗？
是的，对不起，我没有解释清楚。所以我认为我需要的是一年中每一天的向量。所以一个从 2018-01-01 开始的 GPS 时间表，从 2018-01-02 开始的一个等等，像这样：2018-01-01_schedule
所有天的时间都会固定吗？或者您是否希望每天的日程安排时间不同？其实我不知道你为什么需要这么多并行计划？
@Jason 每个向量应该有多少个日期？你已经为第一个显示了"2018-01-01 12:00:00 UTC" "2018-01-03 12:00:00 UTC" "2018-01-05 12:00:00 UTC" "2018-01-07 12:00:00 UTC" "2018-01-09 12:00:00 UTC" "2018-01-11 12:00:00 UTC"，所以每个向量将包含 6 个日期？
@MKR 我想要每天的时间表，因为我们可能会在一年中的任何一天捕捉一只鸟并将卫星发射器连接到它。这样，我们就可以轻松地从我们抓鸟的特定日期开始的时间表列表中上传 GPS 时间表。

标签： r for-loop tidyverse lubridate

【解决方案1】：

一种选择是使用mapply 根据 OP 提供的计划定义为每一行生成计划：

library(lubridate)

# For the sample data max_date needs to be calculated. Otherwise to generate
# schedule for whole 2018 max_date can be taken as 31-Dec-2018.
max_date = max(days_df$days_df)

mapply(function(x)seq(x, max_date, by="2 days"),days_df$days_df) 

#Result : Only first 3 items from the list generated. It will continue 
# [[1]]
# [1] "2018-01-01 12:00:00 CST" "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST"
# [4] "2018-01-07 12:00:00 CST" "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
# 
# [[2]]
# [1] "2018-01-02 12:00:00 CST" "2018-01-04 12:00:00 CST" "2018-01-06 12:00:00 CST"
# [4] "2018-01-08 12:00:00 CST" "2018-01-10 12:00:00 CST"
# 
# [[3]]
# [1] "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST" "2018-01-07 12:00:00 CST"
# [4] "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
# ....
# ....
# ....
# [[10]]
# [1] "2018-01-10 12:00:00 CST"
# 
# [[11]]
# [1] "2018-01-11 12:00:00 CST"

如果 OP 更喜欢将 names 用于结果列表中的项目，则 mapply 可以用作：

更新：根据 OP 的要求，生成 start+10 天的时间表。 10 天相当于10*24*3600 seconds。

mapply(function(x, y)seq(y, y+10*24*3600, by="2 days"),
    as.character(days_df$days_df), days_df$days_df, 
    SIMPLIFY = FALSE,USE.NAMES = TRUE) 

#Result
# $`2018-01-01 12:00:00`
# [1] "2018-01-01 12:00:00 CST" "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST"
# [4] "2018-01-07 12:00:00 CST" "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
#.......
#.......
#.......so on

【讨论】：

这已经接近我所追求的，但我希望时间表是相同的天数。在上面的列表中，[[11]] 只包含一个日期。我尝试修改您的代码，使最大日期等于开始日期加上 10 天，但这似乎也没有达到我想要的效果......
@Jason 这很简单。我可以修改我的答案，以便它可以按照您的要求完成工作。让我写一个单独的函数来说明清楚。
@Jason 看看