【问题标题】:Specific string manipulation of a long vector of characters长向量字符的特定字符串操作
【发布时间】:2021-03-23 23:03:02
【问题描述】:

我是 R 的初学者。我从一组更大的数据中获得了以下字符列表。我想只保留每个字符串中的数值。我该如何进行?我已经尝试过使用stringr包的一些功能,但没有成功。感谢您的帮助。

"(799.88) (966.01) (1634.17) (4714.35) (2992.45) (3200.66)",
"Per capita monthly income 226.9 312.29 452.16 1037.67 1145.13 1178.85",
"(375.99) (293.48) (749.61) (1832.05) (980.07) (1224.46)", "Per capita income / Hour of work 4.10 10.63 8.91 14.40 22.52 18.12 ",
"(6.88) (20.87) (17.30) (27.44) (27.68) (24.47)", "Number of observations (with weight) 727,671 142,936 630,353 413,807 86,717 248,179"
)```

【问题讨论】:

    标签: r string tidyverse stringr


    【解决方案1】:

    您可以使用str_extract_all;要在小数点前捕获.,,请使用字符类[.,]

    library(stringr)
    str_extract_all(x, "\\d+[.,]\\d+")
    [[1]]
    [1] "799.88"  "966.01"  "1634.17" "4714.35" "2992.45" "3200.66"
    
    [[2]]
    [1] "226.9"   "312.29"  "452.16"  "1037.67" "1145.13" "1178.85"
    
    [[3]]
    [1] "375.99"  "293.48"  "749.61"  "1832.05" "980.07"  "1224.46"
    
    [[4]]
    [1] "4.10"  "10.63" "8.91"  "14.40" "22.52" "18.12"
    
    [[5]]
    [1] "6.88"  "20.87" "17.30" "27.44" "27.68" "24.47"
    
    [[6]]
    [1] "727,671" "142,936" "630,353" "413,807" "86,717"  "248,179"
    

    要将它们全部放在一个向量中(而不是列表),请使用unlist

    unlist(str_extract_all(x, "\\d+[.,]\\d+"))
     [1] "799.88"  "966.01"  "1634.17" "4714.35" "2992.45" "3200.66" "226.9"   "312.29"  "452.16"  "1037.67" "1145.13"
    [12] "1178.85" "375.99"  "293.48"  "749.61"  "1832.05" "980.07"  "1224.46" "4.10"    "10.63"   "8.91"    "14.40"  
    [23] "22.52"   "18.12"   "6.88"    "20.87"   "17.30"   "27.44"   "27.68"   "24.47"   "727,671" "142,936" "630,353"
    [34] "413,807" "86,717"  "248,179"
    

    数据:

    x <- c("(799.88) (966.01) (1634.17) (4714.35) (2992.45) (3200.66)",
    "Per capita monthly income 226.9 312.29 452.16 1037.67 1145.13 1178.85",
    "(375.99) (293.48) (749.61) (1832.05) (980.07) (1224.46)", "Per capita income / Hour of work 4.10 10.63 8.91 14.40 22.52 18.12 ",
    "(6.88) (20.87) (17.30) (27.44) (27.68) (24.47)", "Number of observations (with weight) 727,671 142,936 630,353 413,807 86,717 248,179"
    )
    

    【讨论】:

    • 非常感谢@Chris Ruehlemann,它工作得很好,你能解释一下“\\ d + [.,] \\ d +”的含义吗?
    • 请考虑接受答案和/或投票。 \\d代表从 0 到 9 的所有数字,+ 是一个量词,意思是“一次或多次”,[.,] 是一个字符类,只允许在这个地方出现. 和/或,
    【解决方案2】:

    另一个使用 dplyr 和 tidyr 和 readr 获取数字的选项:

    library(dplyr)
    library(tidyr)
    library(readr)
    
    # dummy data as df with one column
    df <- data.frame(vec = c("(799.88) (966.01) (1634.17) (4714.35) (2992.45) (3200.66)",
    "Per capita monthly income 226.9 312.29 452.16 1037.67 1145.13 1178.85",
    "(375.99) (293.48) (749.61) (1832.05) (980.07) (1224.46)", "Per capita income / Hour of work 4.10 10.63 8.91 14.40 22.52 18.12 ",
    "(6.88) (20.87) (17.30) (27.44) (27.68) (24.47)", "Number of observations (with weight) 727,671 142,936 630,353 413,807 86,717 248,179"))
    
    df1 <- df %>% 
      # building a unique identifier from the rownames
      dplyr::mutate(ID = dplyr::row_number()) %>%
      # separate into rows by blanks 
      tidyr::separate_rows(vec, sep = " ") %>% 
      # use automatic number extraction from readr
      dplyr::mutate(NEW = readr::parse_number(vec)) 
    
    # we can now use the ID from before to get retangle shaped data:
    df1 %>% 
      dplyr::group_by(ID) %>% 
      dplyr::mutate(ID2 = dplyr::row_number()) %>% 
      dplyr::select(ID2, NEW) %>% 
      tidyr::pivot_wider(names_from= "ID2", values_from = "NEW")
    

    【讨论】:

      【解决方案3】:

      也许得到一个没有小数的数字可以使它成为一个选项,从 Chris R 修改

      s = c("(6.88) (10) (17.30) ", "Num obs: 7,671 48,179")
      str_extract_all(s, "\\d+[.,]?\\d+")
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2020-05-28
        • 1970-01-01
        • 2021-03-03
        • 2010-11-23
        • 2013-05-04
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多