【问题标题】:Importing a password protected xlsx file into R将受密码保护的 xlsx 文件导入 R
【发布时间】:2021-03-08 03:41:32
【问题描述】:

我发现一个旧线程 (How do you read a password protected excel file into r?) 建议我使用以下代码读取受密码保护的文件:

install.packages("excel.link")

library("excel.link")

dat <- xl.read.file("TestWorkbook.xlsx", password = "pass", write.res.password="pass")

dat

但是,当我尝试这样做时,我的 R 会立即崩溃。我试过删除 write.res.password 参数,这似乎不是问题。我有一种预感,excel.link 可能不适用于最新版本的 R,所以如果您知道任何其他方法可以做到这一点,我将不胜感激。

编辑:使用 read.xlsx 会产生此错误:

    Error in .jcall("RJavaTools", "Ljava/lang/Object;", "newInstance", .jfindClass(class),  : 
     
    org.apache.poi.poifs.filesystem.OfficeXmlFileException: 
The supplied data appears to be in the Office 2007+ XML. 
You are calling the part of POI that deals with OLE2 Office Documents. 
You need to call a different part of POI to process this data (eg XSSF instead of HSSF)

【问题讨论】:

  • 你试过 xlsx 包中的read.xlsx 吗?不幸的是,它确实需要 Java + rJava 包。
  • @neilfws 我试过那个包,但遇到了一个错误。我会把它添加到帖子中。这可以通过安装 rJava 包来解决吗?
  • 您将需要安装rJava 并可能根据您的操作系统对其进行配置。这并不总是那么容易,这就是为什么readxl 是首选的原因。不幸的是,readxl 不处理受密码保护的文件(据我所知)。

标签: r xlsx password-protection


【解决方案1】:

你可以在不知道的情况下用下面的函数删除excel文件的密码(修改版代码可在https://www.r-bloggers.com/2018/05/remove-password-protection-from-excel-sheets-using-r/获得)

remove_Password_Protection_From_Excel_File <- function(dir, file, bool_XLSXM = FALSE)
{
  initial_Dir <- getwd()
  setwd(dir)

  # file name and path after removing protection
  if(bool_XLSXM == TRUE)
  {
    file_unlocked <- stringr::str_replace(basename(file), ".xlsm$", "_unlocked.xlsm")

  }else
  {
    file_unlocked <- stringr::str_replace(basename(file), ".xlsx$", "_unlocked.xlsx")
  }

  file_unlocked_path <- file.path(dir, file_unlocked)

  # create temporary directory in project folder
  # so we see what is going on
  temp_dir <- "_tmp"

  # remove and recreate _tmp folder in case it already exists
  unlink(temp_dir, recursive = TRUE)
  dir.create(temp_dir)

  # unzip Excel file into temp folder
  unzip(file, exdir = temp_dir)

  # get full path to XML files for all worksheets
  worksheet_paths <- list.files(paste0(temp_dir, "/xl/worksheets"), full.name = TRUE, pattern = ".xml")

  # remove the XML node which contains the sheet protection
  # We might of course use e.g. xml2 to parse the XML file, but this simple approach will suffice here
  for(ws in worksheet_paths)
  {
    file_Content <- readLines(ws, encoding = "windows1")

    # the "sheetProtection" node contains the hashed password "<sheetProtection SOME INFO />"
    # we simply remove the whole node
    out <- str_replace(file_Content, "<sheetProtection.*?/>", "")
    writeLines(out, ws)
  }

  worksheet_Protection_Paths <- paste0(temp_dir, "/xl/workbook.xml")
  file_Content <- readLines(worksheet_Protection_Paths , encoding = "windows1")
  out <- stringr::str_replace(file_Content, "<workbookProtection.*?/>", "")
  writeLines(out, worksheet_Protection_Paths)

  # create a new zip, i.e. Excel file, containing the modified XML files
  old_wd <- setwd(temp_dir)
  files <- list.files(recursive = T, full.names = F, all.files = T, no.. = T)

  # as the Excel file is a zip file, we can directly replace the .zip extension by .xlsx
  zip::zip(file_unlocked_path, files = files) # utils::zip does not work for some reason
  setwd(old_wd)

  # clean up and remove temporary directory
  unlink(temp_dir, recursive = T)
  setwd(initial_Dir)
}

删除密码后,您可以读取 Excel 文件。这种方法对我有用。

【讨论】:

    猜你喜欢
    • 2012-12-09
    • 1970-01-01
    • 1970-01-01
    • 2021-05-11
    • 2016-06-21
    • 2018-10-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多