【问题标题】:Look up and extract values exceeding a threshold in r在 r 中查找并提取超过阈值的值
【发布时间】:2018-08-25 05:12:34
【问题描述】:

我有两个数据框:

#df1
df1 = data.frame(id = c("A","B","C","D","E"), 
                 dev = c(213.5, 225.1, 198.9, 201.0, 266.8))
df1
   id   dev
1  A 213.5
2  B 225.1
3  C 198.9
4  D 201.0
5  E 266.8   

#df2
df2 = data.frame(DateTime = seq(
  from = as.POSIXct("1986-1-1 0:00"),
  to = as.POSIXct("1986-1-2 23:00"), 
  by = "hour"), 
  cum_dd = seq(from = 185, to = 295, by = 2.3)) 
head(df2) 
             DateTime cum_dd
1 1986-01-01 00:00:00  185.0
2 1986-01-01 01:00:00  187.3
3 1986-01-01 02:00:00  189.6
4 1986-01-01 03:00:00  191.9
5 1986-01-01 04:00:00  194.2
6 1986-01-01 05:00:00  196.5

我想在 df1 中创建一个新列,列出 df2$cum_dd 超过 df1$dev 的最早 df2$DateTime。

这是我想要的结果:

  id   dev             desired
1  A 213.5 1986-01-01 13:00:00
2  B 225.1 1986-01-01 18:00:00
3  C 198.9 1986-01-01 07:00:00
4  D 201.0 1986-01-01 07:00:00
5  E 266.8 1986-01-02 12:00:00

我熟悉 dplyr 中的 min(which()) 函数,该函数在如下格式化时返回 df2 中 cum_dd 大于 200 的第一行号:

library(dplyr)
min(which (df2$cum_dd > 200))

实际上,我想为 df1 中的每一行运行此函数(用 df1$dev 替换“200”),并查找/提取相应的 df2$DateTime 值而不是行号。

我以为我已经接近了这个,但它不太正确,我在 Stack Overflow 中找不到类似的问题:

desired <- apply(df1, 1, 
           function (x) {ddply(df2, .(DateTime), summarize, 
           min(which (df2$cum_dd > df1$dev)))}) 

如果您有解决方案,非常感谢!

【问题讨论】:

    标签: r dplyr lookup threshold


    【解决方案1】:
    # example datasets
    df1 = data.frame(id = c("A","B","C","D","E"), 
                     dev = c(213.5, 225.1, 198.9, 201.0, 266.8))
    
    df2 = data.frame(DateTime = seq(
      from = as.POSIXct("1986-1-1 0:00"),
      to = as.POSIXct("1986-1-2 23:00"), 
      by = "hour"), 
      cum_dd = seq(from = 185, to = 295, by = 2.3)) 
    
    library(tidyverse)
    
    df1 %>% 
      crossing(df2) %>%         # get all combinations of rows
      group_by(id, dev) %>%     # for each id and dev
      summarise(desired = min(DateTime[cum_dd > dev])) %>%  # get minimum date when cumm_dd exeeds dev
      ungroup()                 # forget the grouping
    
    # # A tibble: 5 x 3
    #   id      dev desired            
    #   <fct> <dbl> <dttm>             
    # 1 A      214. 1986-01-01 13:00:00
    # 2 B      225. 1986-01-01 18:00:00
    # 3 C      199. 1986-01-01 07:00:00
    # 4 D      201  1986-01-01 07:00:00
    # 5 E      267. 1986-01-02 12:00:00
    

    【讨论】:

    • 太棒了,谢谢@AntoniosK!给你两个问题:#1)如何将新创建的“所需”列绑定到 df1? #2) cross(df2) 到底发生了什么?
    • crossing 正在创建df1df2 行之间的所有组合。更多信息在这里?crossing。新数据集与df1 相同,加上新列。所以,你可以简单地做df1 = df1 %&gt;% crossing(df2) %&gt;% ...,这个新的数据集将被保存为df1
    • 知道了,谢谢。我注意到您对代码进行了一些编辑,现在我不再能够重现您发布的内容。现在,代码只返回单行单列。我正在尝试追溯我的步骤,看看出了什么问题......
    • 我上一次编辑是在 56 年前!也许您更新了您的df1,然后您尝试使用(更新后的)df1 从头开始​​运行该过程?尝试启动一个新的 R 会话并运行上面的示例以确保它正确运行。
    【解决方案2】:
    library(tidyverse)
    df1 = data.frame("id" = c("A","B","C","D","E"), "dev" = c(213.5, 225.1, 198.9, 201.0, 266.8))
    
    df2 = data.frame("DateTime" = seq(
      from = as.POSIXct("1986-1-1 0:00"),
      to = as.POSIXct("1986-1-2 23:00"), 
      by = "hour"), 
      "cum_dd" = seq(from = 185, to = 295, by = 2.3)) 
    
    df2 %>% 
      crossing(df1) %>% 
      filter(cum_dd > dev) %>% 
      arrange(DateTime, desc(cum_dd)) %>% 
      group_by(id) %>% 
      distinct(id, .keep_all = T)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2020-03-22
      • 1970-01-01
      • 1970-01-01
      • 2014-04-20
      • 2021-10-31
      • 1970-01-01
      • 2015-12-03
      相关资源
      最近更新 更多