【问题标题】:R Language: Code Taking a Long Time to RunR 语言:需要很长时间才能运行的代码
【发布时间】:2021-04-08 13:00:08
【问题描述】:

我在another question 上发布了如何使用“plotly”库在 R 中绘制交互式时间序列。我收到了答复并尝试运行代码 - 但是,此代码已运行了 3 个小时。数据不是很大,我在 plotly 网站(https://plotly.com/r/cumulative-animations/https://plotly.com/r/custom-buttons/)上尝试了一个类似的例子,它们似乎运行良好。

这是我试图运行的代码(来自我之前的回答):

#load libraries and generate artificial time series data (this part works)

library(xts)
library(ggplot2)
library(dplyr)
library(plotly)

#create data

#time series 1
date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")

date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")

property_damages_in_dollars <- rnorm(731,100,10)

final_data <- data.frame(date_decision_made, property_damages_in_dollars)

final_data %>%
  mutate(date_decision_made = as.Date(date_decision_made)) %>%
  add_count(week = format(date_decision_made, "%W-%y"))

final_data$class = "time_series_1"


#time series 2
date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")

date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")

property_damages_in_dollars <- rnorm(731,10,10)

final_data_2 <- data.frame(date_decision_made, property_damages_in_dollars)

final_data_2 %>%
  mutate(date_decision_made = as.Date(date_decision_made)) %>%
  add_count(week = format(date_decision_made, "%W-%y"))

final_data_2$class = "time_series_2"

#combine
data = rbind(final_data, final_data_2)

第 1 部分:

#part 1:

data <- data %>%
 mutate(tmp_date = as.numeric(as.Date(date_decision_made, format = "%Y/%m/%d")))

accumulate_by <- function(dat, var) {
  var <- lazyeval::f_eval(var, dat)
  lvls <- plotly:::getLevels(var)
  dats <- lapply(seq_along(lvls), function(x) {
    cbind(dat[var %in% lvls[seq(1, x)], ], frame = lvls[[x]])
  })
  dplyr::bind_rows(dats)
}
data <- data %>% accumulate_by(~tmp_date)

fig <- data %>%
  plot_ly(
    x = ~tmp_date, 
    y = ~property_damages_in_dollars,
    split = ~class,
    frame = ~frame, 
    type = 'scatter',
    mode = 'lines', 
    line = list(simplyfy = F)
  )

fig

第 2 部分:

#part 2

updatemenus <- list(
  list(
    active = -1,
    type= 'buttons',
    buttons = list(
      list(
        label = "time_series_1",
        method = "update",
        args = list(list(visible = c(FALSE, TRUE)),
                    list(title = "series 1",
                         annotations = list(c(), high_annotations)))),
      list(
        label = "time_series_2",
        method = "update",
        args = list(list(visible = c(TRUE, FALSE)),
                    list(title = "series 2",
                         annotations = list(low_annotations, c() )))),
      
    )
  )

)

fig <- data %>% plot_ly(type = 'scatter', mode = 'lines') 
fig <- fig %>% add_lines(x=~date_decision_made,
  y=~property_damages_in_dollars, name="High",
  line=list(color="#33CFA5")) 
fig <- fig %>% add_lines(x=~date_decision_made, 
  y=~property_damage_in_dollars, name="Low",
  line=list(color="#F06A6A")) 
fig <- fig %>% layout(title = "Apple", showlegend=FALSE,
                      xaxis=list(title="Date"),
                      yaxis=list(title="Price ($)"),
                      updatemenus=updatemenus)




fig

谁能告诉我我做错了什么?还是我的电脑和 R 控制台有问题?数据不是很大,过去我对类似大小的数据运行过类似的程序,没有太多麻烦。

谢谢

注意:会话信息

 sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plotly_4.9.2.1 dplyr_1.0.2    ggplot2_3.3.2  xts_0.12.1     zoo_1.8-8     

loaded via a namespace (and not attached):
 [1] tinytex_0.26         tidyselect_1.1.0     xfun_0.15            purrr_0.3.4          reshape2_1.4.4       splines_4.0.2       
 [7] lattice_0.20-41      colorspace_1.4-1     vctrs_0.3.2          generics_0.0.2       viridisLite_0.3.0    htmltools_0.5.0     
[13] stats4_4.0.2         yaml_2.2.1           survival_3.2-7       prodlim_2019.11.13   rlang_0.4.7          ModelMetrics_1.2.2.2
[19] pillar_1.4.6         glue_1.4.1           withr_2.3.0          xgboost_1.1.1.1      foreach_1.5.1        lifecycle_0.2.0     
[25] plyr_1.8.6           lava_1.6.8           stringr_1.4.0        timeDate_3043.102    munsell_0.5.0        gtable_0.3.0        
[31] recipes_0.1.13       htmlwidgets_1.5.2    codetools_0.2-16     crosstalk_1.1.0.1    caret_6.0-86         class_7.3-17        
[37] Rcpp_1.0.5           scales_1.1.1         ipred_0.9-9          jsonlite_1.7.1       digest_0.6.25        stringi_1.4.6       
[43] grid_4.0.2           tools_4.0.2          magrittr_1.5         lazyeval_0.2.2       tibble_3.0.3         tidyr_1.1.0         
[49] crayon_1.3.4         pkgconfig_2.0.3      MASS_7.3-53          ellipsis_0.3.1       Matrix_1.2-18        data.table_1.12.8   
[55] pROC_1.16.2          lubridate_1.7.9      gower_0.2.2          httr_1.4.2           rstudioapi_0.11      iterators_1.0.13    
[61] R6_2.4.1             rpart_4.1-15         nnet_7.3-14          nlme_3.1-149         compiler_4.0.2 

【问题讨论】:

  • 您是否尝试过隔离代码的哪一部分花费了这么长时间?我个人会停止它,创建数据(并发现是否是减速),然后逐步尝试找出问题。
  • @Elin:谢谢你的回复。是的,这两个部分都是 plot_ly 函数。他们也需要很长时间吗?
  • 这段代码的某些方面对我来说没有意义(多次计算相同的东西,一遍又一遍地对相同的数据使用 as.Date()。不确定是否有必要也转换为数字。您确定不是 cumulate 函数是问题吗?github.com/ropensci/plotly/issues/957 也尝试切换到使用原始日期而不是 tmp_date 的 x..
  • 仅看这些图表,我确实认为您有很多数据。如果您只是尝试绘制前 4 个月或类似的图表会怎样?

标签: r time time-series plotly data-visualization


【解决方案1】:

虽然数据不小(accumulate_by 转换后的 535,092 行),但生成这些绘图图应该不会花费数小时。在我的机器上,整个过程不到 2 分钟。它似乎留下了一些只有重新启动 R 才能摆脱的数据,因此您可能需要检查内存占用情况。

下面的代码应该是可重现的:

tic <- Sys.time()
suppressPackageStartupMessages(invisible(
  lapply(c("xts", "ggplot2", "dplyr", "plotly"),
    require, character.only = TRUE)))

#create data

#time series 1
date_decision_made <- seq(as.Date("2014/1/1"), as.Date("2016/1/1"), by="day") %>% 
  {format(as.Date(.), "%Y/%m/%d")}
property_damages_in_dollars <- rnorm(731,100,10)

final_data <- data.frame(date_decision_made, property_damages_in_dollars) %>% 
  mutate(date_decision_made = as.Date(date_decision_made),
         class = "time_series_1") %>%
  add_count(week = format(date_decision_made, "%W-%y"))

#time series 2
date_decision_made <- seq(as.Date("2014/1/1"), as.Date("2016/1/1"), by="day") %>% 
  {format(as.Date(.), "%Y/%m/%d")}
property_damages_in_dollars <- rnorm(731,10,10)

final_data_2 <- data.frame(date_decision_made, property_damages_in_dollars) %>% 
  mutate(date_decision_made = as.Date(date_decision_made),
         class = "time_series_2") %>%
  add_count(week = format(date_decision_made, "%W-%y"))

#combine
data <- rbind(final_data, final_data_2)

#part 1:
data <- data %>%
    mutate(tmp_date = as.numeric(as.Date(date_decision_made, format = "%Y/%m/%d")))

accumulate_by <- function(dat, var) {
    var <- lazyeval::f_eval(var, dat)
    lvls <- plotly:::getLevels(var)
    dats <- lapply(seq_along(lvls), function(x) {
        cbind(dat[var %in% lvls[seq(1, x)], ], frame = lvls[[x]])
    })
    dplyr::bind_rows(dats)
}
data <- data %>% accumulate_by(~tmp_date)
lvls <- plotly:::getLevels(data$tmp_date)

## edit: defined custom tick marks with date labels
myticks <- floor(do.call(seq, as.list(c(range(data$tmp_date),length.out=8))))

fig <- data %>%
    plot_ly(
        x = ~tmp_date, 
        y = ~property_damages_in_dollars,
        split = ~class,
        frame = ~frame,
        type = 'scatter',
        mode = 'lines' #, 
        # line = list(simplify = FALSE)
    ) %>%                         # edit: added custom tick labels and text
    layout(xaxis = list(
      tickmode = "array",
      nticks = 8,
      tickvals = myticks, 
      ticktext = data$date_decision_made[myticks]
    ))

myticks <- floor(do.call(seq, as.list(c(range(data$tmp_date),length.out=8))))
fig <- data %>%
    plot_ly(
        x = ~tmp_date, 
        y = ~property_damages_in_dollars,
        text=~date_decision_made,  # edit: added hovertext label
        split = ~class,
        frame = ~frame,
        type = 'scatter',
        mode = 'lines'
    ) %>%                          # edit: added custom tick labels and text
    layout(xaxis = list(
      tickmode = "array",
      nticks = 8,
      tickvals = myticks, 
      ticktext = data$date_decision_made[myticks])
    )

fig


#part 2
high_annotations <- list(
    x=unique(data$date_decision_made[data$property_damages_in_dollars == max(data$property_damages_in_dollars)]), 
    y=max(data$property_damages_in_dollars),
    xref='x', yref='y',
    text=paste0('High: $',max(data$property_damages_in_dollars)),
    ax=0, ay=-40
)

low_annotations <- list(
    x=unique(data$date_decision_made[data$property_damages_in_dollars == min(data$property_damages_in_dollars)]), 
    y=min(data$property_damages_in_dollars),
    xref='x', yref='y',
    text=paste0('Low: $',min(data$property_damages_in_dollars)),
    ax=0, ay=40
)

updatemenus <- list(
    list(
        active = -1,
        type= 'buttons',
        buttons = list(
            list(
                label = "time_series_1",
                method = "update",
                args = list(list(visible = c(FALSE, TRUE)),
                            list(title = "series 1",
                                 annotations = list(c(), high_annotations)))),
            list(
                label = "time_series_2",
                method = "update",
                args = list(list(visible = c(TRUE, FALSE)),
                            list(title = "series 2",
                                 annotations = list(low_annotations, c() ))))
            
        )
    )
    
)

fig <- data %>% plot_ly(type = 'scatter', mode = 'lines') 
fig <- fig %>% add_lines(x=~date_decision_made,
                         y=~property_damages_in_dollars, name="High",
                         line=list(color="#33CFA5")) 
fig <- fig %>% add_lines(x=~date_decision_made, 
                         y=~property_damages_in_dollars, name="Low",
                         line=list(color="#F06A6A")) 
fig <- fig %>% layout(title = "Apple", showlegend=FALSE,
                      xaxis=list(title="Date"),
                      yaxis=list(title="Price ($)"),
                      updatemenus=updatemenus)

fig

toc <- Sys.time()
toc-tic
#> Time difference of 1.61937 mins

reprex package (v0.3.0) 于 2021-01-05 创建

other attached packages:
[1] plotly_4.9.2.2 dplyr_1.0.2    ggplot2_3.3.3  xts_0.12.1     zoo_1.8-8 

【讨论】:

  • 非常感谢您的回复!我花了很多时间试图弄清楚如何弄清楚第 2 部分!至于第 1 部分 - 你知道如何将日期格式从 16.1k 更改为更易于识别的日历日期吗?我非常感谢你的帮助!
  • 查看编辑后的帖子 - 您可以在悬停文本中添加日期,并添加带有日期的自定义 x 轴标签。不幸的是,我没有设法让框架使用日期格式。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2013-10-11
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-05-27
  • 1970-01-01
相关资源
最近更新 更多