【问题标题】:After Strsplit, the output is not in the format expectedStrsplit后,输出不是预期的格式
【发布时间】:2013-07-08 10:36:28
【问题描述】:

我的名为“locaddr”的输入文件有以下记录:

"Shelbourne Road, Dublin, Ireland"                                     
"1 Hatch Street Upper, Dublin, Ireland"                               
"98 Haddington Road, Dublin, Ireland"       
"11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland"
"Winterstraße 17, 69190 Walldorf, Germany"

我使用以下代码将 R 中的 STRSPLIT 函数应用于此文件:

*testmat <- strsplit(locaddr,split=",")
outmat <- matrix(unlist(testmat), nrow=nrow(locaddr), ncol=3, byrow=T)*

我得到的最终输出是:

Street                        City                    Country          
 [1,] "Shelbourne Road"             " Dublin"               " Ireland"       
 [2,] "1 Hatch Street Upper"        " Dublin"               " Ireland"       
 [3,] "98 Haddington Road"          " Dublin"               " Ireland"       
 [4,] "11 Mount Argus Close"        " Harold's Cross"       " Dublin 6W"     
 [5,] " Co. Dublin"                 " Ireland"              "Winterstraße 17"
 [6,] " 69190 Walldorf"             " Germany"              "Caughley Road"  
 [7,] " Broseley"                   " Shropshire TF12 5AT"  " UK"            
 [8,] "Pappelweg 30"                " 48499 Salzbergen"     " Germany"       
 [9,] "60 Grand Canal Street Upper" " Dublin 4"             " Ireland"       
[10,] "Wieslocher Straße"           " 68789 Sankt Leon-Rot" " Germany"

从上面可以明显看出,所需的输出是每条记录中的最后三个词。但相反,我几乎混合了那里的所有东西。

我的要求是虽然地址都是可变长度的,但在STRSPLIT之后,我只需要选择最后三个术语并将它们放入Street,City Country。

非常感谢您的帮助和时间。

【问题讨论】:

  • 你想如何处理像你的第四行这样的地址?
  • 就像 Roman 回答的那样,我只想要最后三个字段,而不管地址的长度如何。感谢您的帮助托马斯。
  • 有道理,只是想检查它是不是更复杂的东西。
  • 嗯..这很有趣。您对处理我的数据中此类地址的其他方法有什么建议吗?
  • 查看我的答案,了解 Roman 的答案的变体,它保留了多行街道地址。

标签: r output strsplit


【解决方案1】:

下次请为您的问题提供一些方便的可重现代码。

以下是我将如何尝试解决此问题的代码。

x <- c("Shelbourne Road, Dublin, Ireland",                                     
       "1 Hatch Street Upper, Dublin, Ireland",                               
       "98 Haddington Road, Dublin, Ireland",      
       "11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland",
       "Winterstraße 17, 69190 Walldorf, Germany")

# split on ,
splitx <- strsplit(x, ",")

# for every list element (lapply climbs the list element-wise)
# subset last 3 elements
last3 <- lapply(splitx, tail, n = 3)

# merge them together by row
do.call("rbind", last3)

     [,1]                   [,2]              [,3]      
[1,] "Shelbourne Road"      " Dublin"         " Ireland"
[2,] "1 Hatch Street Upper" " Dublin"         " Ireland"
[3,] "98 Haddington Road"   " Dublin"         " Ireland"
[4,] " Dublin 6W"           " Co. Dublin"     " Ireland"
[5,] "Winterstraße 17"      " 69190 Walldorf" " Germany"

【讨论】:

  • 这对罗曼很有帮助。我会牢记有关工作代码的提示。谢谢你的建议。
【解决方案2】:

这基本上是 Roman 答案的变体,但旨在组合(可能)多个地址。它假设最后两个逗号分隔的值是城市和国家,然后汇集前面的元素。

# read data
y <- c("Shelbourne Road, Dublin, Ireland",                                     
       "1 Hatch Street Upper, Dublin, Ireland",                               
       "98 Haddington Road, Dublin, Ireland",      
       "11 Mount Argus Close, Harold's Cross, Dublin 6W, Co. Dublin, Ireland",
       "Winterstraße 17, 69190 Walldorf, Germany")
# split and output
result <- lapply(y, function(x) {
    splitx <- strsplit(x, ", ")[[1]]
    rowtail <- tail(splitx, n = 2)
    if(length(splitx)>3)
        multi <- paste(splitx[1:(length(splitx)-2)],collapse=", ")
    else
        multi <- splitx[1]
    return(c(multi,rowtail))
    })
# rbind back together
do.call(rbind,result)

这会产生:

     [,1]                                              [,2]             [,3]     
[1,] "Shelbourne Road"                                 "Dublin"         "Ireland"
[2,] "1 Hatch Street Upper"                            "Dublin"         "Ireland"
[3,] "98 Haddington Road"                              "Dublin"         "Ireland"
[4,] "11 Mount Argus Close, Harold's Cross, Dublin 6W" "Co. Dublin"     "Ireland"
[5,] "Winterstraße 17"                                 "69190 Walldorf" "Germany"

【讨论】:

  • @RomanLuštrik 你必须为此给予弗洛德尔的功劳,因为它也在你的答案中。
猜你喜欢
  • 2014-02-08
  • 2019-05-01
  • 1970-01-01
  • 1970-01-01
  • 2018-04-23
  • 2021-11-27
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多