【问题标题】:Append data.frame objects from an environment to corresponding data.frame objects in GlobalEnv (or another env) in R将环境中的 data.frame 对象附加到 R 中 GlobalEnv(或另一个环境)中的相应 data.frame 对象
【发布时间】:2021-11-27 23:00:19
【问题描述】:

我有几个现有的data.frame 对象需要从 Internet 更新。但是,由于更新与提到的现有对象具有相同的名称,因此我将更新作为 data.frame 对象放在单独的环境中。

然后,想法是将更新附加到现有的data.frame 对象。但我不知道如何使用rbind 从一个环境迭代到GlobalEnv(或另一个环境,就此而言)。

另外,我没有把它们放在这里,但是在GlobalEnv(或加载它们的环境)中会有几个其他data.frame 对象(具有其他名称)。

下面是一段应可重现的代码(带有 cmets 和源链接):

library(quantmod)

# Load ticker data from 2020-01-01 till 2021-02-02
tickers <- c("NKLA", "MPNGF", "RMO", "JD", "MSFT")
getSymbols.yahoo(tickers, auto.assign = TRUE, env = globalenv(), 
                 from = "2020-01-01", to = "2021-02-02")

# Close all Internet connections as a precaution
# https://stackoverflow.com/a/52758758/2950721
closeAllConnections()

# Find xts objects
xtsObjects <- names(which(unlist(eapply(.GlobalEnv, is.xts))))

# Convert xts to data.frame
# https://stackoverflow.com/a/69246047/2950721
for (i in seq_along(xtsObjects)) {
  assign(xtsObjects[i], fortify.zoo(get(xtsObjects[i])))
}


# Redo the previous process but in separate environment for updated
# values of the same tickers (comments and sources are not repeated)
symbolUpdates.env <- new.env()

getSymbols.yahoo(tickers, auto.assign = TRUE, env = symbolUpdates.env,
                 from = "2021-02-03")

closeAllConnections()

symbolUpdatesXtsObjects <- names(which(unlist(eapply(symbolUpdates.env, 
                                                     is.xts))))

for (i in seq_along(symbolUpdatesXtsObjects)) {
  assign(envir = symbolUpdates.env, symbolUpdatesXtsObjects[i], 
         fortify.zoo(get(symbolUpdatesXtsObjects[i], 
                         envir = symbolUpdates.env)))
}

# Find ```data.frame``` objects both in ```GlobalEnv``` and 
# ```symbolUpdates.env```
globalEnvDataframeObjects <- names(which(unlist(eapply(.GlobalEnv, 
                                                        is.data.frame))))
symbolUpdatesDataframeObjects <- names(which(unlist(eapply(symbolUpdates.env, 
                                                           is.data.frame))))


# This rbind definitely does not work!!!
for (i in seq_along(globalEnvDataframeObjects)) {
  rbind(envir = .GlobalEnv, globalEnvDataframeObjects[i], envir =
  symbolUpdates.env, symbolUpdatesDataframeObjects[i])
}

我的问题:

  • 除了基本的R 之外最好没有其他包,哪段代码可以迭代地将symbolUpdatesDataframeObjects 附加到相应的globalEnvDataframeObjects
  • 如果globalEnvDataframeObjects 在另一个环境中(即不是.GlobalEnv,而是像symbolUpdates.env 这样的“子环境”),代码是否相同?
    • 如果没有,会有什么变化?
  • 有没有比我尝试使用的方法更好/更明智的方法?

提前致谢。


使用的系统:

  • R 版本:4.1.1 (2021-08-10)
  • RStudio 版本:1.4.1717
  • 操作系统:macOS Catalina 版本 10.15.7 和 macOS Big Sur 版本 11.6

【问题讨论】:

  • 你想在哪里更新 rbinded 的对象
  • symbolUpdatesDataObjects 列有重复的索引列。因此,它提供 8 列,而 interObj 为每个数据提供 7 列。因此,我们需要删除额外的列。我用[-1]
  • 另外,我发现symbolUpdatesDateaObjects 中的列名的分配方式有所不同。您可以查看sapply(mget(symbolUpdatesDataframeObjects, envir = symbolUpdates.env), names)
  • 你能纠正这些错误,然后下面的代码应该可以工作
  • 好的,我得到了答案:代码末尾的fortify.zoo(get(symbolUpdatesXtsObjects[i]) 应该是fortify.zoo(get(symbolUpdatesXtsObjects[i], envir = symbolUpdates.env)。上面的代码现在对我有用,这意味着没有额外的 Index 列。 :-)

标签: r dataframe append environment


【解决方案1】:

我们这里可能需要intersect

interObj <- intersect(globalEnvDataframeObjects, symbolUpdatesDataframeObjects)
interObj <- interObj[match(interObj, symbolUpdatesDataframeObjects)]
nrow(get(interObj[1]))
[1] 273
for (i in seq_along(interObj)) {
  assign(interObj[i], rbind(get(interObj[i], envir = .GlobalEnv), 
    get(symbolUpdatesDataframeObjects[i], envir = symbolUpdates.env)), envir = .GlobalEnv)
}

【讨论】:

  • akrun,您介意指向intersect 所在的存储库吗,因为我不断收到package ‘Intersect’ is not available for this version of R 的警告。我当然使用setRepositories() 启用所有存储库并使用available.packages() 进行检查。
  • @pdeli intersect 是一个 base R 函数。都是小写的
  • 欧普斯。我的错。对不起阿克伦。您的代码在nrow(get(interObj[1])) 之前有效。但是,当我运行 for 循环时,我收到以下消息:Error in match.names(clabs, names(xi)) : names do not match previous names。可能是因为match之后的interObj在对象中显示了不同的顺序(即之前:[1] "NKLA" "MPNGF" "MSFT" "JD" "RMO",之后:[1] "MPNGF" "RMO" "JD" "MSFT" "NKLA")?
  • 我找到了解决方案。我使用了sort 命令,以便symbolUpdatesDataframeObjectsinterObj 具有相同的顺序。现在它似乎每次都有效。
  • @pdeli match 也应该可以工作。可能是因为在 interObj 中有更多元素导致NA
【解决方案2】:

如果需要在多个环境中存储 data.frames,请使用以下内容:

# Install pacakges if they are not already installed: necessary_packages => vector
necessary_packages <- c("quantmod")

# Create a vector containing the names of any packages needing installation:
# new_pacakges => vector
new_packages <- necessary_packages[!(necessary_packages %in%
                                       installed.packages()[, "Package"])]

# If the vector has more than 0 values, install the new pacakges
# (and their) associated dependencies:
if(length(new_packages) > 0){
  install.packages(
    new_packages, 
    dependencies = TRUE
  )
}

# Initialise the packages in the session: list of boolean => stdout (console)
lapply(
  necessary_packages, 
  require, 
  character.only = TRUE
)

# Load ticker data from 2020-01-01 till 2021-02-02
tickers <- c(
  "NKLA", 
  "MPNGF", 
  "RMO", 
  "JD", 
  "MSFT"
)

# Create a new environment: environment => symbolUpdates.env
symbolUpdates.env <- new.env()

# Create a vector of from dates: from_dates => Date Vector
from_dates <- as.Date(
  c(
    "2020-01-01", 
    "2020-02-03"
  )
)

# Create a vector of to dates:
to_dates <- as.Date(
  c(
    "2021-02-02", 
    format(
      Sys.Date(),
      "%Y-%m-%d"
    )
  )
)

# Create a vetor environments: env_vec => vector of environments
env_vec <- c(
  .GlobalEnv, 
  symbolUpdates.env
)

# Function to retreive ticker as a data.frame: 
# retrieve_ticker_df => function()
retrieve_ticker_df <- function(ticker_vec, from_date, to_date){
  
  # Create a list of size length(tickers):
  # df_list => empty list
  df_list <- vector(
    "list", 
    length(ticker_vec)
  )
  
  # Store each ticker's response as a data.frame in the list:
  # df_list => list of data.frames
  df_list <- setNames(
    lapply(
      seq_along(ticker_vec),
      function(i){
        # Retrieve the data.frame: tmp => data.frame
        tmp <- getSymbols.yahoo(
          ticker_vec[i],
          auto.assign = FALSE, 
          from = from_date,
          to = to_date,
          return.class = 'data.frame',
        )
        
        # Close all Internet connections as a precaution
        # https://stackoverflow.com/a/52758758/2950721
        closeAllConnections()
        
        # Create a data.frame and revert index to sequential
        # integers: data.frame => env
        data.frame(
          cbind(
            date = as.Date(
              row.names(
                tmp
              )
            ),
            tmp
          ),
          row.names = NULL
        )
      }
    ),
    ticker_vec
  )
  # Explicitly define returned object: list of data.frames => env
  return(df_list)
}

# Store all the data.frames in a list of data.frames, 
# store each list of data.frames in a list: 
# ticker_df_list_list => list of list of data.frames
ticker_df_list_list <- lapply(
  seq_along(env_vec),
  function(i){
    retrieve_ticker_df(
      tickers, 
      from_dates[i], 
      to_dates[i]
    )
  }
)

# Push each of the lists to the appropriate environment: 
# data.frames => env
lapply(
  seq_along(ticker_df_list_list),
  function(i){
    list2env(
      ticker_df_list_list[[i]],
      envir = env_vec[[i]]
    )
  }
)

# Initialise an empty list to create some memory
# bound_df_list => empty list
bound_df_list <- vector(
  "list", 
  length(tickers)
)

# Allocate some memory by initialising an
# empty list: ir_list => list
ir_list <- vector(
  "list",
  length(env_vec) * length(tickers)
)

# Unlist the env_vec, and retrieve the ticker
# data.frames: ir_list => list of data.frames
ir_list <- unlist(
  lapply(
    env_vec,
    function(x){
      mget(
        tickers, 
        envir = x
      )
    }
  ),
  recursive = FALSE
)

# Split-apply-combine based on the 
# data.frame names: bound_df_list => list of data.frames
bound_df_list <- lapply(
  split(
    ir_list,
    names(ir_list)
  ),
  function(x){
    do.call(
      rbind, 
      x
    )
  }
)

# Clear up the intermediate objects:
rm(ticker_df_list_list, ir_list, env_vec); gc()

如果不是强制使用多个环境:

# Install pacakges if they are not already installed: necessary_packages => vector
necessary_packages <- c("quantmod")

# Create a vector containing the names of any packages needing installation:
# new_pacakges => vector
new_packages <- necessary_packages[!(necessary_packages %in%
                                       installed.packages()[, "Package"])]

# If the vector has more than 0 values, install the new pacakges
# (and their) associated dependencies:
if(length(new_packages) > 0){
  install.packages(
    new_packages, 
    dependencies = TRUE
  )
}

# Initialise the packages in the session: list of boolean => stdout (console)
lapply(
  necessary_packages, 
  require, 
  character.only = TRUE
)

# Load ticker data from 2020-01-01 till 2021-02-02
tickers <- c(
  "NKLA", 
  "MPNGF", 
  "RMO", 
  "JD", 
  "MSFT"
)

# Create a new environment: environment => symbolUpdates.env
symbolUpdates.env <- new.env()

# Create a vector of from dates: from_dates => Date Vector
from_dates <- as.Date(
  c(
    "2020-01-01", 
    "2020-02-03"
  )
)

# Create a vector of to dates:
to_dates <- as.Date(
  c(
    "2021-02-02", 
    format(
      Sys.Date(),
      "%Y-%m-%d"
    )
  )
)

# Function to retreive ticker as a data.frame: 
# retrieve_ticker_df => function()
retrieve_ticker_df <- function(ticker_vec, from_date, to_date){

  # Create a list of size length(tickers):
  # df_list => empty list
  df_list <- vector(
    "list", 
    length(ticker_vec)
  )
  
  # Store each ticker's response as a data.frame in the list:
  # df_list => list of data.frames
  df_list <- setNames(
    lapply(
      seq_along(ticker_vec),
      function(i){
        # Retrieve the data.frame: tmp => data.frame
        tmp <- getSymbols.yahoo(
          ticker_vec[i],
          auto.assign = FALSE, 
          from = from_date,
          to = to_date,
          return.class = 'data.frame',
        )
        
        # Close all Internet connections as a precaution
        # https://stackoverflow.com/a/52758758/2950721
        closeAllConnections()
        
        # Create a data.frame and revert index to sequential
        # integers: data.frame => env
        data.frame(
          cbind(
            date = as.Date(
              row.names(
                tmp
              )
            ),
            tmp
          ),
          row.names = NULL
        )
      }
    ),
    ticker_vec
  )
  # Explicitly define returned object: list of data.frames => env
  return(df_list)
}

# Store all the data.frames in a list of data.frames, 
# store each list of data.frames in a list: 
# ticker_df_list_list => list of list of data.frames
ticker_df_list_list <- lapply(
  seq_along(from_dates),
  function(i){
    retrieve_ticker_df(
      tickers, 
      from_dates[i], 
      to_dates[i]
    )
  }
)

# Initialise an empty list to create some memory:
# ir_list => empty list
ir_list <- vector(
  "list",
  length(tickers) * length(from_dates)
)

# Populate the list with each of the named data.frames: 
# ir_list => list of data.frames
ir_list <- unlist(
  ticker_df_list_list, 
  recursive = FALSE
)

# Initialise an empty list to create some memory
# bound_df_list => empty list
bound_df_list <- vector(
  "list", 
  length(tickers)
)

# Split-apply-combine: bound_df_list => list of data.frames
bound_df_list <- lapply(
  split(
    ir_list,
    names(ir_list)
  ),
  function(x){
    do.call(
      rbind, 
      x
    )
  }
)

# Clear up the intermediate objects:
rm(ticker_df_list_list, ir_list); gc()

【讨论】:

  • 谢谢@hello_friend。您发布的两个代码都可以正常工作。但是,解决方案将从本地存储加载所有代码的代码。然后,代码检查那些不是最新的并下载它们最后日期和今天之间的增量(我们称它们为增量代码)。然后我需要将增量代码附加到非最新的代码中。这就是为什么我的问题是:“如何将非最新代码的最后日期与今天之间的增量附加到非最新代码?”。
  • @pdeli 很酷,那么您只需将起始日期和终止日期向量更新为动态的,然后您就可以获得增量导出。
  • 谢谢@hello_friend。会尝试您的建议并回复您。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-11-06
  • 1970-01-01
  • 2016-03-07
  • 2011-03-31
  • 2019-08-13
  • 2017-08-01
  • 2019-10-23
相关资源
最近更新 更多