根据变量在每块data.frame中提取数据答案

【问题标题】：Extract data in every chunk of data.frame depending on variable根据变量在每块data.frame中提取数据
【发布时间】：2015-04-24 08:39:36
【问题描述】：

我正在尝试为我的数据的每个块（层）提取第一条记录。我想提取每个块中第一次出现的负值（Mag）以及相应的时间。然后我想比较每个块中的那些“时间”并找到最小值和最大值。（这是第一件事）

我来到了某个地方，但被卡住了。任何帮助，包括缩短代码，将不胜感激。谢谢！

# to make sample data
data_neg<-seq(-0.98,-1,length=300)
data_pos<-seq(0.98,1,length=300)
time<-seq(1,54,length=600)

# binding those neg and pos numbers together
tot_num<- data.frame(c(rep(time, times=4)),c(rep(cbind(data_pos,data_neg),times=4)))    
colnames(tot_num)=c("time","Mag")

# split data into chunks
n <- 1:4  
dfchunk<- split(tot_num, factor(sort(rank(row.names(tot_num))%%n)))
ext_fsw<-lapply(dfchunk[],function(x)with(x,x[Mag<0,,drop=TRUE])) 
# here I want to exctract first appearance of negative value of Mag in each chunk together with corresponding time.

作为我问题的第二部分在@zx8754 建议后，我尝试读取我的真实数据在选择负值的第一次出现后进行循环并绘制结果。但我意识到在我的真实数据中有这样的 N.A 值（我从我的文件夹中读取了 11 个数据，你可以看到下面的代码......）

   X1      X2
1 27.45 -0.0111
2 43.29 -0.9746
3 32.49 -0.9807
4 28.08 -0.0538
5 28.44 -0.0669
 X1      X2
1 28.71 -0.0834
2 43.29 -0.9736
3 32.49 -0.9521
4 29.16 -0.0032
5 29.70 -0.0469
 X1      X2
1 30.06 -0.0112
2 43.29 -0.9724
3 35.37 -0.0448
4 33.03 -0.0308
5 31.59 -0.0055
 X1      X2
1 35.19 -0.0476
2 43.29 -0.9712
3 39.42 -0.0171
4 40.50 -0.0143
5 36.18 -0.0395
 X1      X2
1    NA      NA
2    NA      NA
3    NA      NA
4 50.85 -0.0371
5    NA      NA
   X1  X2
   1 NA      NA
2    NA      NA
3    NA      NA
4    NA      NA
5    NA      NA
   X1 X2
1    NA      NA
2    NA      NA
3    NA      NA
4    NA      NA
5    NA      NA
     X1     X2
1    NA     NA
2    NA     NA
3 49.77 -3e-04
4    NA     NA
5    NA     NA
     X1      X2
1    NA      NA
2    NA      NA
3    NA      NA
4 43.02 -0.0465
5 45.99 -0.9793
     X1      X2
1    NA      NA
2 37.98 -0.0005
3 45.18 -0.9784
4    NA      NA
5 45.09 -0.0551
     X1      X2
1    NA      NA
2    NA      NA
3 36.90 -0.0148
4 46.17 -0.9813
5    NA      NA

这里是for循环来读取我的数据

data.list <- dir(pattern = "*.avgm",full.names = FALSE) # creates the list    of all the csv files in the directory

a<-1:length(data.list)
for(k in 1:length(data.list)){
data1_stt<- read.table(data.list[k],colClasses="numeric",skip=0,   fill=FALSE, sep = "", quote="\"'", dec=".", as.is = TRUE, strip.white=FALSE)
StrL1<-data1_stt[,10]
time<-data1_stt[,1]*10^-3
tot_num<- data.frame(time,StrL1)
colnames(tot_num)=c("time","Mag")
n <- 5  # split data into chunks
dfchunk<- split(tot_num, factor(sort(rank(row.names(tot_num))%%n)))
ext_fsw<-lapply(dfchunk,function(x)x[which(x$Mag<0)[1],])#which - gives the index where the conditions is TRUE, then take the 1st value [1], pass it to x as index for rownumber.
x.n <- data.frame(matrix(unlist(ext_fsw),nrow=5, byrow=T))
print(x.n)
curr<-rep(c(8,7,6,5,4,3.6,3.8,4.2,4.4,4.6,4.8),each=5)
plot(curr,x.n,pch = 20) 
}

简而言之，我的任务的第二步是读取我的所有数据并将其绘制为每个当前值。但我没有做到这一点。很抱歉，我无法在此处放置可重现的示例。由于数据中存在 N.A 值，因此总长度在负值点上有所不同。

【问题讨论】：

标签： r dataframe row extract

【解决方案1】：

试试这个：

ext_fsw<-lapply(dfchunk,function(x)
  x[which(x$Mag<0)[1],]
  )

which - 给出条件为TRUE 的索引，然后将第一个值[1] 传递给x 作为行号的索引。

【讨论】：