【问题标题】:How to call a script in another script in R如何在R中的另一个脚本中调用脚本
【发布时间】:2024-09-23 02:25:02
【问题描述】:

我在 R 中创建了一系列使用特定 URL 完成工作的命令。我想在一个单独的文本文件中的 URL 列表上迭代一系列命令。如何一次将列表调用到命令中?

我不知道这个编程操作的正确术语是什么。我研究过脚本和批处理编程,但这不是我想做的。

# URL that comes from list
URL <- "http://www.urlfromlist.com"

# Load URL
theurl <- getURL(URL,.opts = list(ssl.verifypeer = FALSE) )

# Read the tables
tables <- readHTMLTable(theurl)

# Create a list
tables <- list.clean(tables, fun = is.null, recursive = FALSE)

# Convert the list to a data frame
df <- do.call(rbind.data.frame, tables)

# Save dataframe out as a csv file
write.csv(df2, file = dynamicname, row.names=FALSE)

上面的代码就是我正在做的。第一个变量每次都需要是来自列表的不同 URL - 冲洗并重复。谢谢!

更新的代码 - 这仍然没有写出任何文件,而是运行。

# Function to pull tables from list of URLs
URLfunction<- function(x){
  # URL that comes from list
  URL <- x

  # Load URL
  theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) )

  # Read the tables
  tables <- XML::readHTMLTable(theurl)

  # Create a list
  tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE)

  # Convert the list to a data frame
  df <- do.call(rbind,tables)

  # Split date and time column out
  df2 <- separate(df, "Date / Time", c("Date", "Time"), sep = " ")

  # Fill the missing column with text, in this case shapename
  shapename <- qdapRegex::ex_between(URL, "ndxs", ".html")
  df2$Shape <- shapename

  # Save dataframe out as a csv file
  write.csv(result, paste0(shapename, '.csv', row.names=FALSE))

  return(df2)
}

URL <- read.csv("PATH", header = FALSE)
purrr::map_df(URL, URLfunction) ## Also tried purrr::map_df(URL[,1], URLfunction) 

【问题讨论】:

  • URL 列表是在本地计算机上的文本文档中还是在 URL 中?
  • 嗨,安德鲁,是的,网址在 csv 中,

标签: r scripting


【解决方案1】:

如果我正确理解您的问题, 我的回答可以解决你的问题。

使用过的库

library(RCurl)
library(XML)
library(rlist)
library(purrr)

定义函数

URLfunction<- function(x){
# URL that comes from list
URL <- x

# Load URL
theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) )

# Read the tables
tables <- XML::readHTMLTable(theurl)

# Create a list
tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE)

# Convert the list to a data frame
df <- do.call(rbind,tables)

# Save dataframe out as a csv file

return(df)
}

假设您有如下数据

(我不确定你的数据是什么样的)

URL <- c("https://*.com/questions/56139810/how-to-call-a-script-in-another-script-in-r",
         "https://*.com/questions/56122052/labelling-points-on-a-highcharter-scatter-chart/56123057?noredirect=1#comment98909916_56123057")

result<- purrr::map(URL, URLfunction) 
result <- do.call(rbind, result)

Write.csv 是最后一步

如果您想通过每个 URL 写入.csv,请移至 URLfunction

write.csv(result, file = dynamicname, row.names=FALSE)

附加

列出版本

URL <- list("https://*.com/questions/56139810/how-to-call-a-script-in-another-script-in-r",
        "https://*.com/questions/56122052/labelling-points-on-a-highcharter-scatter-chart/56123057?noredirect=1#comment98909916_56123057")


result<- purrr::map_df(URL, URLfunction) 

>result

   asked    today yesterday
1 viewed 35 times      <NA>
2 active    today      <NA>
3 viewed     <NA>  34 times
4 active     <NA>     today

CSV

URL <- read.csv("PATH",header = FALSE)

result<- purrr::map_df(URL[,1], URLfunction) 

>result

   asked    today yesterday
1 viewed 35 times      <NA>
2 active    today      <NA>
3 viewed     <NA>  34 times
4 active     <NA>     today

添加代码的编辑版本。


URLfunction<- function(x){
  # URL that comes from list
  URL <- x
  
  # Load URL
  theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) )
  
  # Read the tables
  tables <- XML::readHTMLTable(theurl)
  
  # Create a list
  tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE)
  
  # Convert the list to a data frame
  df <- do.call(rbind,tables)
  
  # Split date and time column out
  df2 <- tidyr::separate(df, "Date / Time", c("Date", "Time"), sep = " ")
  
  # Fill the missing column with text, in this case shapename

  shapename <- unlist(qdapRegex::ex_between(URL, "ndxs", ".html"))
  # qdapRegex::ex_between returns list type, when it added to df2 it couldn't be saved. 
  # So i added 'unlist' 

  df2$Shape <- shapename
  
  # Save dataframe out as a csv file
  write.csv(df2, paste0(shapename, '.csv'), row.names=FALSE)
# Here are two error.
# First, You maked the data named 'df2' not 'result'. So i changed result -->df2
# Second, row.names is not the 'paste0' attributes, it is 'write.csv's attributes.  
  return(df2)
}

定义上述函数后,

URL = c("nuforc.org/webreports/ndxsRectangle.html",
        "nuforc.org/webreports/ndxsRound.html")

RESULT = purrr::map_df(URL, URLfunction) ## Also tried purrr::map_df(URL[,1], URLfunction) 

最后,我得到了下面的结果

1. Rectangle.csv, Round.csv files on your desktop(Saved path).
2. Returning row binded data frame looks like below (2011 x 8)
> RESULT[1,]
    Date  Time     City State     Shape  Duration
1 5/2/19 00:20 Honolulu    HI Rectangle 3 seconds
                                                                                                                             Summary
1 Several of rectangles connected in different LED like colors.  Such as red, green, blue, etc. ;above Waikiki. ((anonymous report))
  Posted
1 5/9/19

【讨论】:

  • 谢谢。我有一个包含所有 url 的 csv 文件,但我可以尝试使用上面显示的向量。
  • 谢谢,我可以运行该功能,但没有达到效果。我想分别在每个 URL 上运行该函数,然后将文件写出。请参阅上面的编辑代码。有什么想法吗?
  • 当然,这只是一个 URL 列表,请参阅下面的前两个 URL。但是,我认为正在发生的是代码将所有 URL 编译到一个数据帧中,而不是分别执行每个 URL,然后将每个 CSV 写入每个 URL。在写出它们之前,我需要单独对每个 URL 进行更改。无法进行更改,因为 URL 是所有链接,而不是一次一个。让我知道。 nuforc.org/webreports/ndxsRectangle.html, nuforc.org/webreports/ndxsRound.html
  • 从您的回复中,我添加了新的编辑版本(从您的代码中修改了一点)。它对我很有效!
  • 感谢您的编辑。它也对我有用。我真的很感激!
最近更新 更多