【问题标题】:Splitting and extracting Strings in R在 R 中拆分和提取字符串
【发布时间】:2015-05-07 09:40:56
【问题描述】:

规则

{Denny 煎锅} => {Denny C-Size 电池}

{Denny Scented Tissue} => {Denny Paper Plates}

{Blue Label 花式罐头蛤蜊} => {Blue Label 罐装金枪鱼水}

{丹尼塑料叉} => {黄金冷冻豌豆}

{Denny 煎锅} => {Denny D-Size 电池}

{Denny Plastic Forks} => {Faux Products Apricot Shampoo}

{Golden Frozen Peas} => {Denny Plastic Forks}

{Faux Products Apricot Shampoo} => {Denny Plastic Forks}

{Blue Label 罐装金枪鱼水} => {Blue Label 花式罐头蛤蜊}

{Blue Label Canned String Beans} => {Faux Products Buffered Aspirin}

{Denny D 型电池} => {Denny 煎锅}

我有一个如上所述的单列数据框。 我想把上面的规则分成LHS和RHS

LHS 应包含 {} 之前 => 之间的字符 同样,RHS 应该包含在 =>

之后的下一个 {} 之间的字符

我想知道如何在 R 中做到这一点?

【问题讨论】:

    标签: r string substring extract substr


    【解决方案1】:
    RULES <- c("{Denny Frying Pan} => {Denny C-Size Batteries}",
               "{Denny Scented Tissue} => {Denny Paper Plates}",
               "{Blue Label Fancy Canned Clams} => {Blue Label Canned Tuna in Water}",
               "{Denny Plastic Forks} => {Golden Frozen Peas}",
               "{Denny Frying Pan} => {Denny D-Size Batteries}",
               "{Denny Plastic Forks} => {Faux Products Apricot Shampoo}",
               "{Golden Frozen Peas} => {Denny Plastic Forks}",
               "{Faux Products Apricot Shampoo} => {Denny Plastic Forks}",
               "{Blue Label Canned Tuna in Water} => {Blue Label Fancy Canned Clams}",
               "{Blue Label Canned String Beans} => {Faux Products Buffered Aspirin}",
               "{Denny D-Size Batteries} => {Denny Frying Pan}")
    
    df <- as.data.frame(do.call(rbind,strsplit(RULES,"} => {",fixed=TRUE)))
    df[,1] <- gsub("{","",df[,1],fixed = TRUE)
    df[,2] <- gsub("}","",df[,2],fixed = TRUE)
    
    df
                                    V1                              V2
    1                 Denny Frying Pan          Denny C-Size Batteries
    2             Denny Scented Tissue              Denny Paper Plates
    3    Blue Label Fancy Canned Clams Blue Label Canned Tuna in Water
    4              Denny Plastic Forks              Golden Frozen Peas
    5                 Denny Frying Pan          Denny D-Size Batteries
    6              Denny Plastic Forks   Faux Products Apricot Shampoo
    7               Golden Frozen Peas             Denny Plastic Forks
    8    Faux Products Apricot Shampoo             Denny Plastic Forks
    9  Blue Label Canned Tuna in Water   Blue Label Fancy Canned Clams
    10  Blue Label Canned String Beans  Faux Products Buffered Aspirin
    11          Denny D-Size Batteries                Denny Frying Pan
    

    【讨论】:

    • df {",fixed=TRUE))) strsplit(RULES, "} => 错误{", fixed = TRUE) : 非字符参数
    • @Nimish Jain 可能是因为规则是一个因素。试试RULES &lt;- as.character(RULES)
    • df[,2] [.data.frame(df, , 2) 中的错误:选择了未定义的列
    • 你的向量RULE有第一行"RULE"吗?我以为是标题。如果是这样,请删除第一行。使用我在答案中添加的数据,它可以工作
    【解决方案2】:

    您可以尝试以下方法之一。两者都假设您从一个名为“规则”的字符向量开始。如果“规则”已经是您的data.frame 中的一列,您需要稍作修改。

    library(splitstackshape)
    library(dplyr)
    
    data.table(rules = gsub("[{}]", "", gsub("=>", "\t", rules))) %>%
      cSplit("rules", "\t")
    #                             rules_1                         rules_2
    #  1:                Denny Frying Pan          Denny C-Size Batteries
    #  2:            Denny Scented Tissue              Denny Paper Plates
    #  3:   Blue Label Fancy Canned Clams Blue Label Canned Tuna in Water
    #  4:             Denny Plastic Forks              Golden Frozen Peas
    #  5:                Denny Frying Pan          Denny D-Size Batteries
    #  6:             Denny Plastic Forks   Faux Products Apricot Shampoo
    #  7:              Golden Frozen Peas             Denny Plastic Forks
    #  8:   Faux Products Apricot Shampoo             Denny Plastic Forks
    #  9: Blue Label Canned Tuna in Water   Blue Label Fancy Canned Clams
    # 10:  Blue Label Canned String Beans  Faux Products Buffered Aspirin
    # 11:          Denny D-Size Batteries                Denny Frying Pan
    
    library(dplyr)
    library(tidyr)
    
    data.frame(rules) %>%
      mutate(rules = gsub("\\s+=>\\s+", "=>", rules)) %>%
      mutate(rules = gsub("[{}]", "", rules)) %>%
      separate(rules, into = c("V1", "V2"), sep = "=>")
    

    【讨论】:

      【解决方案3】:

      这是我坚持使用 qdapRegex 的一种方法:

      RULES <- c("{Denny Frying Pan} => {Denny C-Size Batteries}",
                 "{Denny Scented Tissue} => {Denny Paper Plates}",
                 "{Blue Label Fancy Canned Clams} => {Blue Label Canned Tuna in Water}",
                 "{Denny Plastic Forks} => {Golden Frozen Peas}",
                 "{Denny Frying Pan} => {Denny D-Size Batteries}",
                 "{Denny Plastic Forks} => {Faux Products Apricot Shampoo}",
                 "{Golden Frozen Peas} => {Denny Plastic Forks}",
                 "{Faux Products Apricot Shampoo} => {Denny Plastic Forks}",
                 "{Blue Label Canned Tuna in Water} => {Blue Label Fancy Canned Clams}",
                 "{Blue Label Canned String Beans} => {Faux Products Buffered Aspirin}",
                 "{Denny D-Size Batteries} => {Denny Frying Pan}")
      
      library(qdapRegex)
      setNames(do.call(rbind.data.frame, rm_curly(RULES, extract=TRUE)), c("LHS", "RHS"))
      
      ##                                LHS                             RHS
      ## 1                 Denny Frying Pan          Denny C-Size Batteries
      ## 2             Denny Scented Tissue              Denny Paper Plates
      ## 3    Blue Label Fancy Canned Clams Blue Label Canned Tuna in Water
      ## 4              Denny Plastic Forks              Golden Frozen Peas
      ## 5                 Denny Frying Pan          Denny D-Size Batteries
      ## 6              Denny Plastic Forks   Faux Products Apricot Shampoo
      ## 7               Golden Frozen Peas             Denny Plastic Forks
      ## 8    Faux Products Apricot Shampoo             Denny Plastic Forks
      ## 9  Blue Label Canned Tuna in Water   Blue Label Fancy Canned Clams
      ## 10  Blue Label Canned String Beans  Faux Products Buffered Aspirin
      ## 11          Denny D-Size Batteries                Denny Frying Pan
      

      我们提取花括号之间的内容,然后使用do.call + rbind.data.frame 强制转换为data.frame

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-07-12
        • 1970-01-01
        • 2014-07-31
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多