【问题标题】:Mutate to remove all parenthesis (and contents) from string in R变异以从 R 中的字符串中删除所有括号(和内容)
【发布时间】:2020-11-12 04:09:26
【问题描述】:

我正在尝试使用 mutate/str_replace 通过删除括号(包括内容)从“类”生成“表型”,但需要一些正则表达式的帮助? 然后,我还想重新排序“表型”字符串中的文本,使文本按 PanCK>PD-L1>CD8>FoxP3>PD-1>CD68 的顺序显示。 为非标准数据集道歉! 非常感谢!

test<- data.frame(Class = c("FoxP3 (Opal 570): PanCK (Opal 690): PD-1 (Opal 620): CD68 (Opal 780)"
, "CD8 (Opal 480): PanCK (Opal 690): CD68 (Opal 780): PD-L1 (Opal 520)", "PanCK (Opal 690): CD68 (Opal 780)", 
"FoxP3 (Opal 570): PanCK (Opal 690)"))

我遇到的问题

test.output<- test %>% mutate(Phenotype = str_replace(Class, "\\([^()]{0,}\\)", ""))

想要的输出:

test.output <- data.frame(Class = c("FoxP3 (Opal 570): PanCK (Opal 690): PD-1 (Opal 620): CD68 (Opal 780)"
                                    , "CD8 (Opal 480): PanCK (Opal 690): CD68 (Opal 780): PD-L1 (Opal 520)", 
                                    "PanCK (Opal 690): CD68 (Opal 780)", "FoxP3 (Opal 570): PanCK (Opal 690)"), 
                          Phenotype = c("FoxP3:PanCK:PD-1:CD68", "CD8:PanCK:CD68:PD-L1", 
                                        "PanCK:CD68", "CD8:PanCK:CD68:PD-L1"))

然后重新排序,使得 PanCK>PD-L1>CD8>FoxP3>PD-1>CD68

ordered.output<- data.frame(Class = c("FoxP3 (Opal 570): PanCK (Opal 690): PD-1 (Opal 620): CD68 (Opal 780)"
                                            , "CD8 (Opal 480): PanCK (Opal 690): CD68 (Opal 780): PD-L1 (Opal 520)", 
                                            "PanCK (Opal 690): CD68 (Opal 780)", "FoxP3 (Opal 570): PanCK (Opal 690)"), 
                                  Phenotype = c("FoxP3:PanCK:PD-1:CD68", "CD8:PanCK:CD68:PD-L1", 
                                                "PanCK:CD68", "CD8:PanCK:CD68:PD-L1"),
                                  Phenotype_Ordered = c("PanCK:FoxP3:PD-1:CD68", "PanCK:PD-L1:CD8:CD68",
                                                        "PanCK:CD68","PanCk:PD-L1:CD8:CD68"))

【问题讨论】:

  • 删除括号及其内容是重复的of this question - 也许您可以在此处应用答案并编辑此问题以专注于重新排序?
  • 您的正则表达式的想法是正确的,我认为您只需将str_replace(替换第一个匹配项)更改为str_replace_all(替换所有匹配项)。
  • 谢谢!我确实阅读了该线程,但是我看不到它如何应用于单个字符串中的多个括号 - 这是标准 tidyverse 示例之外的 str_replace_all 的一个很好的示例。

标签: r regex stringr dplyr


【解决方案1】:

另一个技巧是:

my_order <- c("CD68", "PD-1", "FoxP3", "CD8", "PD-L1", "PanCK")
test %>% 
  mutate(prototype = gsub('\\s*[(][^)]+[)]','',Class),
         ordered = map_chr(strsplit(prototype, '\\s*:\\s*'),
                      ~str_c(sort(ordered(.x,my_order), decreasing = TRUE), collapse = ":")))
                                                                 Class                prototype               ordered
1 FoxP3 (Opal 570): PanCK (Opal 690): PD-1 (Opal 620): CD68 (Opal 780) FoxP3: PanCK: PD-1: CD68 PanCK:FoxP3:PD-1:CD68
2  CD8 (Opal 480): PanCK (Opal 690): CD68 (Opal 780): PD-L1 (Opal 520)  CD8: PanCK: CD68: PD-L1  PanCK:PD-L1:CD8:CD68
3                                    PanCK (Opal 690): CD68 (Opal 780)              PanCK: CD68            PanCK:CD68
4                                   FoxP3 (Opal 570): PanCK (Opal 690)             FoxP3: PanCK           PanCK:FoxP3

【讨论】:

    【解决方案2】:

    这行得通吗:

    st <- c('PanCK','PD-L1','CD8','FoxP3','PD-1','CD68')
    test %>% 
    mutate(Phenotype = str_remove_all(Class, '\\s\\(Opal [0-9]{3}\\)')) %>% 
    mutate(Phenotype = str_remove_all(Phenotype, '(\\s)')) %>% 
    mutate(Phenotype_Ordered = str_split(Phenotype, ':')) %>% unnest(Phenotype_Ordered) %>% 
    group_by(Class) %>% arrange(factor(Phenotype_Ordered, levels = st)) %>% 
    mutate(Phenotype_Ordered = paste(Phenotype_Ordered, collapse = ':')) %>% distinct()
    # A tibble: 4 x 3
    # Groups:   Class [4]
      Class                                                                Phenotype             Phenotype_Ordered    
      <chr>                                                                <chr>                 <chr>                
    1 FoxP3 (Opal 570): PanCK (Opal 690): PD-1 (Opal 620): CD68 (Opal 780) FoxP3:PanCK:PD-1:CD68 PanCK:FoxP3:PD-1:CD68
    2 CD8 (Opal 480): PanCK (Opal 690): CD68 (Opal 780): PD-L1 (Opal 520)  CD8:PanCK:CD68:PD-L1  PanCK:PD-L1:CD8:CD68 
    3 PanCK (Opal 690): CD68 (Opal 780)                                    PanCK:CD68            PanCK:CD68           
    4 FoxP3 (Opal 570): PanCK (Opal 690)                                   FoxP3:PanCK           PanCK:FoxP3   
    

    【讨论】:

    • 太棒了!为什么需要删除空格才能使用:mutate(Phenotype = str_remove_all(Phenotype, '(\\s)'))
    • 带有str_replace_all 的正则表达式也适用于此处的第一行:test %&gt;% mutate(Phenotype = str_replace_all(Class, "\\([^()]{0,}\\)", ""))
    • @JamesMonkman,刚刚完成了关于空白的预期输出。是的,一般来说,正则表达式可以通过不同的方式来处理。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-12-31
    • 1970-01-01
    • 2014-08-02
    • 1970-01-01
    相关资源
    最近更新 更多