【问题标题】:How to rename filnames stored as tibble/dataframe using dplyr如何使用 dplyr 重命名存储为 tibble/dataframe 的文件名
【发布时间】:2018-02-15 09:16:26
【问题描述】:

我有以下包含文件列表的数据框。

library(tidyverse)
dat <- structure(list(source_file = structure(c("data/monroe_20180214/180131 WT PB d5/PB x10_01.tif", 
"data/monroe_20180214/180131 WT PB d5/PB x10_02.tif", "data/monroe_20180214/180131 WT PB d5/PB x10_03.tif", 
"data/monroe_20180214/180131 WT PB d5/PB x10_04.tif", "data/monroe_20180214/180131 WT PB d5/PB x10_05.tif", 
"data/monroe_20180214/180131 WT PB d5/PB x10_06.tif"), class = c("fs_path", 
"character"))), .Names = "source_file", row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))


dat
#> # A tibble: 6 x 1
#>   source_file                                       
#>   <chr>                                             
#> 1 data/monroe_20180214/180131 WT PB d5/PB x10_01.tif
#> 2 data/monroe_20180214/180131 WT PB d5/PB x10_02.tif
#> 3 data/monroe_20180214/180131 WT PB d5/PB x10_03.tif
#> 4 data/monroe_20180214/180131 WT PB d5/PB x10_04.tif
#> 5 data/monroe_20180214/180131 WT PB d5/PB x10_05.tif
#> 6 data/monroe_20180214/180131 WT PB d5/PB x10_06.tif

我想要做的是创建第二列new_filename,方法是将前两个目录路径替换为新路径pooled/,并将空格替换为.,反斜杠替换为__。我怎样才能做到这一点?

想要的结果是

  source_file                                         new_filename                                   
1 data/monroe_20180214/180131 WT PB d5/PB x10_01.tif  pooled/180131.WT.PB.d5__PB.x10_01.tif 
2 data/monroe_20180214/180131 WT PB d5/PB x10_02.tif  ...
3 data/monroe_20180214/180131 WT PB d5/PB x10_03.tif  .etc.
4 data/monroe_20180214/180131 WT PB d5/PB x10_04.tif  
5 data/monroe_20180214/180131 WT PB d5/PB x10_05.tif  
6 data/monroe_20180214/180131 WT PB d5/PB x10_06.tif  

【问题讨论】:

    标签: r regex dplyr tidyverse


    【解决方案1】:

    使用来自stringgsub(),您也可以做到这一点

         dat %>% mutate(new_var = gsub("data/monroe_20180214", "pooled", source_file),
    +                new_var = gsub(" ", ".", new_var), 
    +                new_var = gsub("/", "_", new_var), 
    +                new_var = gsub("pooled_", "pooled/", new_var))
    # A tibble: 6 x 2
                                             source_file                              new_var
                                                   <chr>                                <chr>
    1 data/monroe_20180214/180131 WT PB d5/PB x10_01.tif pooled/180131.WT.PB.d5_PB.x10_01.tif
    2 data/monroe_20180214/180131 WT PB d5/PB x10_02.tif pooled/180131.WT.PB.d5_PB.x10_02.tif
    3 data/monroe_20180214/180131 WT PB d5/PB x10_03.tif pooled/180131.WT.PB.d5_PB.x10_03.tif
    4 data/monroe_20180214/180131 WT PB d5/PB x10_04.tif pooled/180131.WT.PB.d5_PB.x10_04.tif
    5 data/monroe_20180214/180131 WT PB d5/PB x10_05.tif pooled/180131.WT.PB.d5_PB.x10_05.tif
    6 data/monroe_20180214/180131 WT PB d5/PB x10_06.tif pooled/180131.WT.PB.d5_PB.x10_06.tif
    

    【讨论】:

      【解决方案2】:

      一个班轮:

      paste0("pooled/",chartr(" /", "._",(sub("^(?:[^\\/]*\\/){2}","",dat$source_file))))
      
      
      #[1] "pooled/180131.WT.PB.d5_PB.x10_01.tif"
      #[2] "pooled/180131.WT.PB.d5_PB.x10_02.tif"
      #[3] "pooled/180131.WT.PB.d5_PB.x10_03.tif"
      #[4] "pooled/180131.WT.PB.d5_PB.x10_04.tif"
      #[5] "pooled/180131.WT.PB.d5_PB.x10_05.tif"
      #[6] "pooled/180131.WT.PB.d5_PB.x10_06.tif"
      

      这里我们首先用空字符串("")替换前两次出现/的部分,然后使用base R中的chartr函数将空格替换为点(.)和正斜杠(@ 987654327@) 带下划线 (_) 和pastepooled/ 的字符串。

      sub 部分的正则表达式取自 here

      dplyr 调用中添加这个:

      dat %>%
       mutate(new_filename =paste0("pooled/", chartr(" /", "._", 
                                  (sub("^(?:[^\\/]*\\/){2}", "", source_file))))) %>%
       select(new_filename)
      
      
      #new_filename                        
      #  <chr>                               
      #1 pooled/180131.WT.PB.d5_PB.x10_01.tif
      #2 pooled/180131.WT.PB.d5_PB.x10_02.tif
      #3 pooled/180131.WT.PB.d5_PB.x10_03.tif
      #4 pooled/180131.WT.PB.d5_PB.x10_04.tif
      #5 pooled/180131.WT.PB.d5_PB.x10_05.tif
      #6 pooled/180131.WT.PB.d5_PB.x10_06.tif
      

      【讨论】:

        猜你喜欢
        • 2022-08-17
        • 2019-02-04
        • 2018-05-29
        • 1970-01-01
        • 2017-06-10
        • 1970-01-01
        • 2020-02-02
        • 2017-09-20
        • 2020-01-24
        相关资源
        最近更新 更多