【问题标题】:Compare within row across multiple columns remove non-matching and create new row跨多列在行内比较删除不匹配并创建新行
【发布时间】:2021-04-23 09:33:32
【问题描述】:

我正在尝试计算他的相同地址并按行分组。我相当接近,但在特定地址之间的列之间存在细微差别。目的是从行中删除任何不匹配的地址,并将它们作为新行添加到 df.街道号或街区号之间通常存在差异。我已经从代码广告中提取了这些数字,我试图找到那些不匹配的数字,删除它们并创建一个新行并适当地更改计数。计数更改可以在之后进行,只需检查各行是否缺失。

数据集实际上有 5000 行,一行中最多包含 50 个建筑物。这是一个示例。

 df<-data.frame(bldg1 = c("26 this street, big district","block8, fancy estate, small district", "11 normal lane, district"),
                bldg2 = c("27 this street, big district","block8, fancy estate, small district", "11 normal lane, district"),
                bldg3 = c("26 this street, big district","block6, fancy estate, small district", "11 normal lane, district"),
                bldg4 = c("26 this street, big district","block8, fancy estate, small district", NA),
                bldg5 = c("26 this street, big district","block6, fancy estate, small district", "11 normal lane, district"),
                bldg1strnum = c("26",NA, "11"),
                bldg2strnum = c("27",NA, "11"),
                bldg3strnum = c("26",NA, "11"),
                bldg4strnum = c("26",NA, "11"),
                bldg5strnum = c("26",NA, "11"),
                bldg1blck = c(NA,"8", NA),
                bldg2blck = c(NA,"8", NA),
                bldg3blck = c(NA,"6", NA),
                bldg4blck = c(NA,"8", NA),
                bldg5blck = c(NA,"6", NA),
               count = (5,5,4))

我正在考虑将dplyracrosslength(unique) 一起使用,但不知道如何正确运行它,尤其是如何将mutate 转换为新行的长格式。

我希望得到的结果如下所示。 (突变后不需要街道号码和名称

 df<-data.frame(bldg1 = c("26 this street, big district","block8, fancy estate, small district", "11 normal lane, district", "27 this street, big district","block6, fancy estate, small district"),
               bldg2 = c(NA, "block8, fancy estate, small district", "11 normal lane, district",NA,"block6, fancy estate, small district"),
               bldg3 = c("26 this street, big district",NA, "11 normal lane, district", NA, NA),
               bldg4 = c("26 this street, big district","block8, fancy estate, small district", NA,NA,NA),
               bldg5 = c("26 this street, big district",NA, "11 normal lane, district",NA,NA),
               count = ("4","3","4","1","2"))

【问题讨论】:

    标签: r dplyr across


    【解决方案1】:

    这是你要找的东西吗:

    df %>% 
      select(bldg1, bldg2, bldg3, bldg4, bldg5) %>% 
      pivot_longer(
        cols = everything()
      ) %>% 
      arrange(value) %>% 
      add_count(value)
    

    输出:

       name  value                                    n
       <chr> <chr>                                <int>
     1 bldg1 11 normal lane, district                 4
     2 bldg2 11 normal lane, district                 4
     3 bldg3 11 normal lane, district                 4
     4 bldg5 11 normal lane, district                 4
     5 bldg1 26 this street, big district             4
     6 bldg3 26 this street, big district             4
     7 bldg4 26 this street, big district             4
     8 bldg5 26 this street, big district             4
     9 bldg2 27 this street, big district             1
    10 bldg3 block6, fancy estate, small district     2
    11 bldg5 block6, fancy estate, small district     2
    12 bldg1 block8, fancy estate, small district     3
    13 bldg2 block8, fancy estate, small district     3
    14 bldg4 block8, fancy estate, small district     3
    15 bldg4 NA                                       1
    

    【讨论】:

    • 不,我需要将那些相同的地址保留在同一行中,并且地址在拼写上有一些细微的差异,所以你不能对它们进行分组,这就是我提取街道的原因/行内的块号,并用它来查看地址是否不同
    猜你喜欢
    • 2015-12-23
    • 1970-01-01
    • 1970-01-01
    • 2012-01-27
    • 1970-01-01
    • 2021-11-18
    • 2019-01-09
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多