在R中的循环中将文件保存到特定的子文件夹中答案

【问题标题】：save files into a specific subfolder in a loop in R在R中的循环中将文件保存到特定的子文件夹中
【发布时间】：2017-07-11 02:06:30
【问题描述】：

我觉得我非常接近解决方案，但目前我无法弄清楚如何到达那里。

我遇到了以下问题。在我的“测试”文件夹中，我有堆叠的数据文件，名称为M1_1; M1_2、M1_3 等等：例如/Test/M1_1.dat。不，我想分离文件，以便得到：M1_1[1].dat, M1_1[2].dat, M1_1[3].dat 等等。这些文件我想保存在特定的子文件夹中：Test/M1/M1_1[1]; Test/M1/M1_1[2] 等等，Test/M2/M1_2[1], Test/M2/M1_2[2] 等等。

现在我已经创建了子文件夹。我得到了以下命令来拆分文件，以便得到M1_1.dat[1] 等等：

for (e in dir(path = "Test/", pattern = ".dat", full.names=TRUE, recursive=TRUE)){
  data <- read.table(e, header=TRUE)
  df <- data[ -c(2) ]
  out <- split(df , f = df$.imp)
    lapply(names(out),function(z){
    write.table(out[[z]], paste0(e, "[",z,"].dat"),
                sep="\t", row.names=FALSE, col.names = FALSE)})
}

现在 paste0 命令为我提供了我想要的拆分数据（虽然它是 M1_1.dat[1] 而不是 M1_1[1].dat），但我不知道如何将这些数据放入我的子文件夹中。

也许你有想法？

提前致谢。

【问题讨论】：

标签： r loops directory subdirectory

【解决方案1】：

我不知道您的数据是什么样的，所以我将尝试使用baby names 提供的性别数据集重新创建场景

假设 zip 文件夹中的所有文件都存储到“inst/data”

将所有文件路径存储到`all_fi`变量

all_fi <- list.files("inst/data", 
                         full.names = TRUE, 
                         recursive = TRUE, 
                         pattern = "\\.txt$")

    > head(all_fi, 3)
    [1] "inst/data/yob1880.txt" "inst/data/yob1881.txt"

将应用于目录中每个文件的预设函数

f.it <- function(f_in = NULL){
# Create the new folder based on the existing basename of the input file
   new_folder <- file_path_sans_ext(f_in)
   dir.create(new_folder)

    data.table::fread(f_in) %>% 
    select(name = 1, gender = 2, freq = 3) %>% 
    mutate(
     gender = ifelse(grepl("F", gender), "female","male")
    ) %>% (function(x){

     # Dataset contains names for males and females
     # so that's what I'm using to mimic your split
     out <- split(x, x$gender)
      o <- rbind.pages(
             lapply(names(out), function(i){
             # New filename for each iteration of the split dataframes

             ###### THIS IS WHERE YOU NEED TO TWEAK FOR YOUR NEEDS
             new_dest_file <- sprintf("%s/%s.txt", new_folder, i)
             # Write the sub-data-frame to the new file
             data.table::fwrite(out[[i]], new_dest_file)
             # For our purposes return a dataframe with file info on the new
             # files...

              data.frame(
                file_name = new_dest_file,
                file_size = file.size(new_dest_file), 
                stringsAsFactors = FALSE)
            })
           )
        o
    })
}

现在我们可以循环了：

注意：出于我的目的，我不会花时间遍历每个文件，出于您的目的，这将适用于您的每个初始文件，或者在我的情况下是 all_fi 而不是 all_fi[2:5]。

> rbind.pages(lapply(all_fi[2:5], f.it))

============================  =========
file_name                     file_size
============================  =========
inst/data/yob1881/female.txt      16476
inst/data/yob1881/male.txt        15306
inst/data/yob1882/female.txt      18109
inst/data/yob1882/male.txt        16923
inst/data/yob1883/female.txt      18537
inst/data/yob1883/male.txt        15861
inst/data/yob1884/female.txt      20641
inst/data/yob1884/male.txt        17300
============================  =========

【讨论】：

感谢您的回答，但我明白了 - 只需要在 paste0-command 中进行一些更改：write.table(out[[z]], paste0(gsub(".dat", "", e), "/[",z,"].dat"), sep="\t", row.names=FALSE, col.names = FALSE)}) 这样我的文件将分别保存在文件名的文件夹中。我的名称为M1_1.dat 的文件保存在/Test/M1_1 中，为[1].dat; [2].dat 等，而不是Test/[1].dat.

将所有文件路径存储到all_fi变量

将应用于目录中每个文件的预设函数

现在我们可以循环了：

将所有文件路径存储到`all_fi`变量