【问题标题】:speeding up nested ifelse statement - R加快嵌套 ifelse 语句 - R
【发布时间】:2016-10-02 18:16:53
【问题描述】:

我的代码中此时的示例:

    time_elapsed                     network_name             daypart       day
 1:         4705                          Laff TV 2016-09-09 03:11:35    Friday
 2:         1800                              CNN 2016-09-10 08:00:00  Saturday
 3:           23                             INSP 2016-09-02 18:00:00    Friday
 4:          148                              NBC 2016-09-02 16:01:26    Friday
 5:          957                  History Channel 2016-09-07 14:44:03 Wednesday
 6:         1138         Nickelodeon/Nick-at-Nite 2016-09-09 16:00:00    Friday
 7:          120                       Starz Edge 2016-09-07 15:28:59 Wednesday
 8:          268            Starz Encore Westerns 2016-09-07 17:13:05 Wednesday
 9:            6                              CBS 2016-09-10 04:00:00  Saturday
10:           69                      Independent 2016-09-07 12:48:11 Wednesday
11:         4151                              NBC 2016-09-09 04:32:37    Friday
12:          570 PBS: Public Broadcasting Service 2016-09-07 16:17:58 Wednesday
13:         1421                            NBCSN 2016-09-03 15:22:23  Saturday
14:          466          Estrella TV (Broadcast) 2016-09-04 19:00:00    Sunday

(通常超过 2 亿行)

几个月前,当我在几百万行上运行整个脚本时,我编写了以下嵌套 ifelse 语句,但现在我正在以更大的规模运行它,我真的很想找到一种方法让它快一点。

targets_random$daypart <- ifelse((wday(targets_random$daypart) == 1 | 
                wday(targets_random$daypart) == 7), "W: Weekend",
                        ifelse(hour(targets_random$daypart) <= 2, "LP: Late Prime",
                        ifelse((hour(targets_random$daypart) >= 3 & 
                hour(targets_random$daypart) <= 5), "O: Overnight",
                        ifelse((hour(targets_random$daypart) >= 6 & 
                hour(targets_random$daypart) <= 9), "EM: Early Morning",
                        ifelse((hour(targets_random$daypart) >= 10 & 
                hour(targets_random$daypart) <= 16), "D: Day",
                        ifelse((hour(targets_random$daypart) >= 17 & 
                hour(targets_random$daypart) <= 20), "F: Fringe",
                        ifelse(hour(targets_random$daypart) >= 21, "P: Prime", NA)))))))

我尝试使用 data.table 解决方案,但速度仅稍微快一点,并将我的 data.table 转换为列表。对于我的一生,我不明白为什么。这增加了足够的时间来撤消它不值得节省。

任何建议将不胜感激。我有什么作品,如果我必须坚持下去,那就没问题了。目前运行完整代码大约需要 3.5 小时。最大的部分是 SQL 查询和结果的文件创建,但如果我能尽可能多地节省时间,那就太好了!

(作为旁注 - 在我用 data.table 语法替换大量部件之前几乎需要 8 小时。我现在是官方粉丝!)

【问题讨论】:

  • 您也许可以使用 parLapply 一次运行多行
  • ?cut。似乎您可以使用cut(targets_random$daypart$hour, c(-Inf, 3, 6, 10, 17, 21, Inf), include.lowest = TRUE, right = FALSE) 之类的东西,但用c("LP: Late Prime", "O: Overnight", etc...) 更改“标签”参数,然后在(targets_random$daypart$wday + 1) %in% c(1, 7) 的任何地方用"W: Weekend" 替换

标签: r if-statement nested


【解决方案1】:

考虑为所有可能的组合及其结果构建一个单独的静态 daytimes 数据框。在 SQL 实践中,这将被视为一个查找表。然后定期与完整的数据表合并。

# DF (N=168) 7 X 24
daytimes <- expand.grid(wday=c(1:7),
                        hour=c(1:24))    
daytimes$result <- 
  ifelse((daytimes$wday == 1|daytimes$wday == 7), "W: Weekend",
       ifelse(daytimes$hour <= 2, "LP: Late Prime",
             ifelse((daytimes$hour >= 3 & daytimes$hour <= 5), "O: Overnight",
                    ifelse((daytimes$hour >= 6 & daytimes$hour <= 9), "EM: Early Morning",
                           ifelse((daytimes$hour >= 10 & daytimes$hour <= 16), "D: Day",
                                  ifelse((daytimes$hour >= 17 & daytimes$hour <= 20), "F: Fringe",
                                         ifelse(daytimes$hour >= 21, "P: Prime", NA)))))))
# CREATE MERGE FIELDS
targets_random$wday <- wday(targets_random$daypart)
targets_random$hour <- hour(targets_random$daypart)

# MERGE WITH NEW COLUMN: result
targets_random <- merge(targets_random, daytimes, by=c("wday", "hour"))         

【讨论】:

  • 哦,我要试试!
猜你喜欢
  • 1970-01-01
  • 2013-08-03
  • 2021-09-26
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-08-10
  • 2014-07-12
相关资源
最近更新 更多