【问题标题】:R Sort strings according to substringR根据子字符串对字符串进行排序
【发布时间】:2015-08-13 02:15:27
【问题描述】:

我有一组文件名,例如:

filelist <- c("filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt")

我想根据“-”后面的数字过滤它们。

例如,在python中,我可以使用排序函数的key参数:

filelist <- ["filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt"]
sorted(filelist, key=lambda(x): int(x.split("-")[1].split(".")[0]))

> ["filec-1.txt", "fileb-2.txt", "filef-4.txt", "filed-5.txt", "filea-10.txt"]

在 R 中,我正在使用 strsplitlapply 到目前为止没有运气。

在 R 中的实现方式是什么?

编辑: 文件名可以是很多东西,并且可能包含更多数字。唯一固定的模式是我要排序的数字在“-”之后。另一个(真实的)例子:

c <- ("boards10017-51.mp4",  "boards10065-66.mp4",  "boards10071-81.mp4",
      "boards10185-91.mp4", "boards10212-63.mp4",  "boards1025-51.mp4",   
      "boards1026-71.mp4",   "boards10309-89.mp4", "boards10310-68.mp4",  
      "boards10384-50.mp4",  "boards10398-77.mp4",  "boards10419-119.mp4", 
      "boards10421-85.mp4",  "boards10444-87.mp4",  "boards10451-60.mp4",  
      "boards10461-81.mp4",  "boards10463-52.mp4",  "boards10538-83.mp4",  
      "boards10575-62.mp4",  "boards10577-249.mp4")"

【问题讨论】:

  • 总是只有一个数字吗?你不能只提取数字和顺序吗?
  • 不,抱歉。还有更多的数字。一个真实的例子是boards451-74。我会编辑。
  • 好的。我添加了一个更新。

标签: regex r sorting


【解决方案1】:

我不确定您的文件名列表的实际复杂性,但类似以下内容可能就足够了:

filelist[order(as.numeric(gsub("[^0-9]+", "", filelist)))]
# [1] "filec-1.txt"  "fileb-2.txt"  "filef-4.txt"  "filed-5.txt"  "filea-10.txt"

考虑到您的编辑,您可能希望将 gsub 更改为:

gsub(".*-|\\..*", "", filelist)

同样,如果没有更多的文本案例,很难说这是否足以满足您的需求。


例子:

 x <- c("boards10017-51.mp4", "boards10065-66.mp4", "boards10071-81.mp4", 
     "boards10185-91.mp4", "boards10212-63.mp4", "boards1025-51.mp4",     
     "boards1026-71.mp4", "boards10309-89.mp4", "boards10310-68.mp4",     
     "boards10384-50.mp4", "boards10398-77.mp4", "boards10419-119.mp4",   
     "boards10421-85.mp4", "boards10444-87.mp4", "boards10451-60.mp4",    
     "boards10461-81.mp4", "boards10463-52.mp4", "boards10538-83.mp4",    
     "boards10575-62.mp4", "boards10577-249.mp4")  

x[order(as.numeric(gsub(".*-|\\..*", "", x)))]
##  [1] "boards10384-50.mp4"  "boards10017-51.mp4"  "boards1025-51.mp4"  
##  [4] "boards10463-52.mp4"  "boards10451-60.mp4"  "boards10575-62.mp4" 
##  [7] "boards10212-63.mp4"  "boards10065-66.mp4"  "boards10310-68.mp4" 
## [10] "boards1026-71.mp4"   "boards10398-77.mp4"  "boards10071-81.mp4" 
## [13] "boards10461-81.mp4"  "boards10538-83.mp4"  "boards10421-85.mp4" 
## [16] "boards10444-87.mp4"  "boards10309-89.mp4"  "boards10185-91.mp4" 
## [19] "boards10419-119.mp4" "boards10577-249.mp4" 

【讨论】:

  • 也可以直接定位数字:sub(".*-(\\d+).*", "\\1", x)
【解决方案2】:

我做了一个正则表达式排序函数:

功能:

reg_sort <- function(x,...,verbose=F) {
    ellipsis <-   sapply(as.list(substitute(list(...)))[-1], deparse, simplify="array")
    reg_list <-   paste0(ellipsis, collapse=',')
    reg_list %<>% strsplit(",") %>% unlist %>% gsub("\\\\","\\",.,fixed=T)
    pattern  <-   reg_list %>% map_chr(~sub("^-\\\"","",.) %>% sub("\\\"$","",.) %>% sub("^\\\"","",.) %>% trimws)
    descInd  <-   reg_list %>% map_lgl(~grepl("^-\\\"",.)%>%as.logical)

    reg_extr <-   pattern %>% map(~str_extract(x,.)) %>% c(.,list(x)) %>% as.data.table
    reg_extr[] %<>% lapply(., function(x) type.convert(as.character(x), as.is = TRUE))

    map(rev(seq_along(pattern)),~{reg_extr<<-reg_extr[order(reg_extr[[.]],decreasing = descInd[.])]})

    if(verbose) { tmp<-lapply(reg_extr[,.SD,.SDcols=seq_along(pattern)],unique);names(tmp)<-pattern;tmp %>% print }

    return(reg_extr[[ncol(reg_extr)]])
}

数据:

filelist <- c("filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt")

调用函数

reg_sort(filelist,"\\d+")
#[1] "filec-1.txt"  "fileb-2.txt"  "filef-4.txt"  "filed-5.txt"  "filea-10.txt"

其他特点是:

  • 降序排列:reg_sort(filelist,-"\\d+")

    #[1] "filea-10.txt" "filed-5.txt" "filef-4.txt" "fileb-2.txt" "filec-1.txt"

  • 多层排序:reg_sort(filelist,-"\\d+","\\w")(对这个示例数据没有意义)

  • 详细模式:reg_sort(filelist,"\\d+",verbose=T)(查看/检查正则表达式模式提取的内容以进行排序)

    $\\d+ [1] 1 2 4 5 10

    [1] "filec-1.txt" "fileb-2.txt" "filef-4.txt" "filed-5.txt" "filea-10.txt"

【讨论】:

    猜你喜欢
    • 2021-11-09
    • 1970-01-01
    • 2015-09-11
    • 1970-01-01
    • 1970-01-01
    • 2020-02-20
    • 1970-01-01
    • 1970-01-01
    • 2021-10-13
    相关资源
    最近更新 更多