我可以匹配 R 中 switch 语句中的正则表达式吗？答案

【问题标题】：Can I match regular expressions in switch statements in R?我可以匹配 R 中 switch 语句中的正则表达式吗？
【发布时间】：2021-06-15 10:15:19
【问题描述】：

我有一个字符串向量，要求人们猜测某人的年龄，这包括“50-60”、“ca. 50”或“>50”等语句。我想使用正则表达式来匹配这些情况并获得真正的数值。 “50-60”应该产生 55（作为两个值的平均值），其他两个示例产生 50。

对于每个变体，我想在下面的开关中有一个案例，但它似乎不起作用。甚至可以在开关中使用正则表达式吗？

switch (string,
          str_detect(string, "[:digit:]+[:blank:]*(-|_)[:blank:]*[:digit:]+") = {
            first <- str_sub(string, 1, 2) %>% as.numeric()
            second <- str_sub(string, str_length(string)-1, str_length(string)) %>% as.numeric()
            value <- mean(c(first, second))
          },
          str_detect(string, "((ca)\.?)|>|~[:blank:]*[:digit:]+") = {
            value <- str_sub(string, str_length(string)-1, str_length(string)) %>% as.numeric()
          },
          str_detect(string, "[:digit:]+[:punct:]") = {
            value <- str_sub(string, 1, 2) %>% as.numeric()
          },
          print(string, " could not be matched")
  )

表达式本身按预期工作（据我测试），所以我想我不能像这样在 switch 中使用它们。但是我在任何地方都找不到解决方案。

编辑：添加了示例的预期输出

【问题讨论】：

请添加一些相关的示例数据和所需的输出。
文本中已经有示例，但我添加了这些示例的预期输出。
不，您不能在 switch 中这样做。它只是一个函数，函数调用中arg = expr 的LHS 必须是名称，而不是逻辑表达式。 dplyr::case_when 函数设置为允许一系列逻辑表达式。

标签： r regex switch-statement stringr

【解决方案1】：

我们可以使用tidyverse 方法来做到这一点

将字符串转换为tibble/data.frame
用str_remove_all删除不需要的字符
然后，通过指定sep 将separate 列一分为二
获取rowMeans

library(dplyr)
library(tidyr)
library(stringr)
tibble(mystring) %>%
    mutate(mystring = str_remove_all(mystring, "[A-Za-z.><]+")) %>% 
    separate(mystring, into = c('col1', 'col2'), sep="[- ]+", 
         convert = TRUE) %>%
    transmute(out = rowMeans(., na.rm = TRUE))

-输出

# A tibble: 3 x 1
    out
  <dbl>
1    55
2    50
3    50

数据

mystring <- c("50-60", "ca. 50", ">50")

【讨论】：

【解决方案2】：

您可以使用嵌套的if/else 方法-

library(stringr)

string <- "50-60"

if(str_detect(string, "[:digit:]+[:blank:]*(-|_)[:blank:]*[:digit:]+")) {
          first <- str_sub(string, 1, 2) %>% as.numeric()
          second <- str_sub(string, str_length(string)-1, str_length(string)) %>% as.numeric()
          value <- mean(c(first, second))
          value
        } else if(str_detect(string, "((ca)\\.?)|>|~[:blank:]*[:digit:]+")) {
          value <- str_sub(string, str_length(string)-1, str_length(string)) %>% as.numeric()
          value
        } else if(str_detect(string, "[:digit:]+[:punct:]")) {
          value <- str_sub(string, 1, 2) %>% as.numeric()
          value
        } else NA

#[1] 55

对于string <- "ca. 50"，它返回 50。

【讨论】：

【解决方案3】：

mystring <- c("50-60", "ca. 50", ">50")

library(stringr)
lapply(str_extract_all(mystring, "[0-9]+"), 
       function(x) if (length(x) == 1) as.numeric(x[1]) else mean(as.numeric(x)))
[[1]]
[1] 55

[[2]]
[1] 50

[[3]]
[1] 50

【讨论】：