长向量字符的特定字符串操作答案

【问题标题】：Specific string manipulation of a long vector of characters长向量字符的特定字符串操作
【发布时间】：2021-03-23 23:03:02
【问题描述】：

我是 R 的初学者。我从一组更大的数据中获得了以下字符列表。我想只保留每个字符串中的数值。我该如何进行？我已经尝试过使用stringr包的一些功能，但没有成功。感谢您的帮助。

"(799.88) (966.01) (1634.17) (4714.35) (2992.45) (3200.66)",
"Per capita monthly income 226.9 312.29 452.16 1037.67 1145.13 1178.85",
"(375.99) (293.48) (749.61) (1832.05) (980.07) (1224.46)", "Per capita income / Hour of work 4.10 10.63 8.91 14.40 22.52 18.12 ",
"(6.88) (20.87) (17.30) (27.44) (27.68) (24.47)", "Number of observations (with weight) 727,671 142,936 630,353 413,807 86,717 248,179"
)```

【问题讨论】：

标签： r string tidyverse stringr

【解决方案1】：

您可以使用str_extract_all；要在小数点前捕获. 或,，请使用字符类[.,]：

library(stringr)
str_extract_all(x, "\\d+[.,]\\d+")
[[1]]
[1] "799.88"  "966.01"  "1634.17" "4714.35" "2992.45" "3200.66"

[[2]]
[1] "226.9"   "312.29"  "452.16"  "1037.67" "1145.13" "1178.85"

[[3]]
[1] "375.99"  "293.48"  "749.61"  "1832.05" "980.07"  "1224.46"

[[4]]
[1] "4.10"  "10.63" "8.91"  "14.40" "22.52" "18.12"

[[5]]
[1] "6.88"  "20.87" "17.30" "27.44" "27.68" "24.47"

[[6]]
[1] "727,671" "142,936" "630,353" "413,807" "86,717"  "248,179"

要将它们全部放在一个向量中（而不是列表），请使用unlist：

unlist(str_extract_all(x, "\\d+[.,]\\d+"))
 [1] "799.88"  "966.01"  "1634.17" "4714.35" "2992.45" "3200.66" "226.9"   "312.29"  "452.16"  "1037.67" "1145.13"
[12] "1178.85" "375.99"  "293.48"  "749.61"  "1832.05" "980.07"  "1224.46" "4.10"    "10.63"   "8.91"    "14.40"  
[23] "22.52"   "18.12"   "6.88"    "20.87"   "17.30"   "27.44"   "27.68"   "24.47"   "727,671" "142,936" "630,353"
[34] "413,807" "86,717"  "248,179"

数据：

x <- c("(799.88) (966.01) (1634.17) (4714.35) (2992.45) (3200.66)",
"Per capita monthly income 226.9 312.29 452.16 1037.67 1145.13 1178.85",
"(375.99) (293.48) (749.61) (1832.05) (980.07) (1224.46)", "Per capita income / Hour of work 4.10 10.63 8.91 14.40 22.52 18.12 ",
"(6.88) (20.87) (17.30) (27.44) (27.68) (24.47)", "Number of observations (with weight) 727,671 142,936 630,353 413,807 86,717 248,179"
)

【讨论】：

非常感谢@Chris Ruehlemann，它工作得很好，你能解释一下“\\ d + [.,] \\ d +”的含义吗？
请考虑接受答案和/或投票。 \\d代表从 0 到 9 的所有数字，+ 是一个量词，意思是“一次或多次”，[.,] 是一个字符类，只允许在这个地方出现. 和/或,。

【解决方案2】：

另一个使用 dplyr 和 tidyr 和 readr 获取数字的选项：

library(dplyr)
library(tidyr)
library(readr)

# dummy data as df with one column
df <- data.frame(vec = c("(799.88) (966.01) (1634.17) (4714.35) (2992.45) (3200.66)",
"Per capita monthly income 226.9 312.29 452.16 1037.67 1145.13 1178.85",
"(375.99) (293.48) (749.61) (1832.05) (980.07) (1224.46)", "Per capita income / Hour of work 4.10 10.63 8.91 14.40 22.52 18.12 ",
"(6.88) (20.87) (17.30) (27.44) (27.68) (24.47)", "Number of observations (with weight) 727,671 142,936 630,353 413,807 86,717 248,179"))

df1 <- df %>% 
  # building a unique identifier from the rownames
  dplyr::mutate(ID = dplyr::row_number()) %>%
  # separate into rows by blanks 
  tidyr::separate_rows(vec, sep = " ") %>% 
  # use automatic number extraction from readr
  dplyr::mutate(NEW = readr::parse_number(vec)) 

# we can now use the ID from before to get retangle shaped data:
df1 %>% 
  dplyr::group_by(ID) %>% 
  dplyr::mutate(ID2 = dplyr::row_number()) %>% 
  dplyr::select(ID2, NEW) %>% 
  tidyr::pivot_wider(names_from= "ID2", values_from = "NEW")

【讨论】：

【解决方案3】：

也许得到一个没有小数的数字可以使它成为一个选项，从 Chris R 修改

s = c("(6.88) (10) (17.30) ", "Num obs: 7,671 48,179")
str_extract_all(s, "\\d+[.,]?\\d+")

【讨论】：