【发布时间】:2016-10-13 11:14:43
【问题描述】:
我有很多数据,其中有 5 个变量:主题、日期、日期+小时、浓度测量和喂食。
因此,对于每个主题,我们从日期+小时(1) 到日期+小时(n) 进行了一些测量。所以我们对每个主题都有 n 个测量值。我想做的是通过为每个主题日期+小时[i]-日期+小时1计算每一行的记录时间。 所以为此,我做了一个循环。它运行良好,直到我意识到我对每个主题都有几天的记录。所以这意味着我必须为每个主题和每个日期计算记录时间。
这是我的脚本:
getwd()
setwd("H:/OptiMIR LMD files/week1")
Week1<-read.csv("week1.csv", header=T)
head(Week1)
colnames(Week1)<-c("CowID","Date", "DateHour","Measure","Feeding")
head(Week1)
#Association colums with class
Week1$CowID<-as.factor(Week1$CowID)
Week1$Date<-as.Date(Week1$Date, format = "%d/%m/%Y")
Week1$DateHour<-strptime(Week1$DateHour, format = "%Y/%m/%d/%H:%M:%S")
Week1$Measure<-as.numeric(as.vector(Week1$Measure))
Week1$Feeding<-as.factor(Week1$Feeding)
str(Week1)
summary(Week1)
unique(Week1$CowID)
#Calculate Time of measure
library(lubridate)
library(foreach)
Time<-c()
#nrow(LMD)
for (i in 1:nrow(Week1)) {
for (j in unique(Week1$CowID)) {
for (k in unique(Week1$Date)) {
if (Week1$CowID[i]==j & Week1$Date[i]==k) {
foreach(unique(Week1$CowID) & unique(Week1$Date))
Time[i]<-c(difftime(Week1[i,3], Week1[match(k,Week1$Date),3], units="secs"))
}
}
}
}
Week1<-cbind(Week1,Time)
这里是标题和摘要:
> head(Week1)
CowID Date DateHour Measure Feeding
1 1990 2014-01-13 2014-01-13 16:21:02 119 hoko
2 1990 2014-01-13 2014-01-13 16:21:02 116 hoko
3 1990 2014-01-13 2014-01-13 16:21:03 111 hoko
4 1990 2014-01-13 2014-01-13 16:21:03 77 hoko
5 1990 2014-01-13 2014-01-13 16:21:04 60 hoko
6 1990 2014-01-13 2014-01-13 16:21:04 65 hoko
> summary(Week1)
CowID Date DateHour
2239 : 1841 Min. :2014-01-13 Min. :2014-01-13 14:33:05
2067 : 1816 1st Qu.:2014-01-13 1st Qu.:2014-01-13 16:10:14
2246 : 1797 Median :2014-01-14 Median :2014-01-14 15:10:51
2062 : 1792 Mean :2014-01-13 Mean :2014-01-14 14:55:45
2248 : 1757 3rd Qu.:2014-01-15 3rd Qu.:2014-01-15 14:32:59
2171 : 1738 Max. :2014-01-15 Max. :2014-01-15 15:55:09
(Other):14259
Measure Feeding
Min. : 4.0 hoko :16857
1st Qu.: 65.0 strap: 8143
Median : 108.0
Mean : 147.4
3rd Qu.: 185.0
Max. :1521.0
所以对于 1990 年,我将有其他记录日期。这就是我的问题,因为这个循环:
Time<-c()
for (i in 1:nrow(Week1) {
for (j in unique(Week1$CowID)) {
for (k in min(Week1$Date):max(Week1$Date)) {
if ((week1$CowID[i]==j) & (Week1$Date[i]==k)) {
Time[i]<-c(difftime(Week1[i,3], Week1[match(k, Week1$Date),3], units="secs"))
}
}
}
}
当我有一天的测量/主题时工作。但是现在我有几天的记录,它适用于一个主题,但是当涉及到另一个主题时,我的记录时间为负......
我想我知道问题出在哪里:在循环中,“for k...”。我必须告诉 R,他必须为每个独特的主题查看一个日期。但我不知道该怎么做
谢谢
【问题讨论】:
-
这些循环很难做到这一点。最简单的方法是
dplyr或data.table。使用dplyr,我想你想要的是group_by(Week1, CowID, Date) %>% mutate(Time = DateHour - min(DateHour)),但很难确定。您能否针对所显示的数据的head显示您想要的输出? -
好的..我要调查一下...我可以获得的向量时间的第一个值是:[1] 0 0 1 1 2 2 3 4 4 5 5 6 [13 ] 6 7 7 8 8 9 9 10 10 11 11 12 [25] 12 13 13 14 15 15 16 16 17 17 18 18 [37] 19 19 20 20 21 21 22 22 23 23 24 24新主题(cowID),它给出了错误的结果,就像它没有考虑到它是另一个 CowID
-
如果你想修复你的循环代码,我认为最大的问题是你最外层的循环覆盖了所有的行。您正在使用
match来解决它,但使用循环更自然的方法是将组用作外部循环,然后最内层循环遍历组中的每一行。 -
不要把输出放在cmets中,很难理解。将其编辑到您的问题中(最好将其添加到数据框中),以便我们可以看到。您可能还想查看tips for making reproducible examples。这是一个很好的问题,但如果您的数据被可重复地共享会更好,例如
dput(droplevels(head(Week1, 10)))- 或其他一些有几头奶牛和几天的小子集,足以说明问题。dput()输出看起来很难看,但它可以复制/粘贴到 R 中以重新创建数据。 -
好的,谢谢,我会更新的。我尝试了 group_by 并出现错误:“eval 中的错误(expr,envir,enclos):列 'DateHour' 具有不受支持的类:POSIXlt,POSIXt”