【发布时间】:2021-04-23 09:33:32
【问题描述】:
我正在尝试计算他的相同地址并按行分组。我相当接近,但在特定地址之间的列之间存在细微差别。目的是从行中删除任何不匹配的地址,并将它们作为新行添加到 df.街道号或街区号之间通常存在差异。我已经从代码广告中提取了这些数字,我试图找到那些不匹配的数字,删除它们并创建一个新行并适当地更改计数。计数更改可以在之后进行,只需检查各行是否缺失。
数据集实际上有 5000 行,一行中最多包含 50 个建筑物。这是一个示例。
df<-data.frame(bldg1 = c("26 this street, big district","block8, fancy estate, small district", "11 normal lane, district"),
bldg2 = c("27 this street, big district","block8, fancy estate, small district", "11 normal lane, district"),
bldg3 = c("26 this street, big district","block6, fancy estate, small district", "11 normal lane, district"),
bldg4 = c("26 this street, big district","block8, fancy estate, small district", NA),
bldg5 = c("26 this street, big district","block6, fancy estate, small district", "11 normal lane, district"),
bldg1strnum = c("26",NA, "11"),
bldg2strnum = c("27",NA, "11"),
bldg3strnum = c("26",NA, "11"),
bldg4strnum = c("26",NA, "11"),
bldg5strnum = c("26",NA, "11"),
bldg1blck = c(NA,"8", NA),
bldg2blck = c(NA,"8", NA),
bldg3blck = c(NA,"6", NA),
bldg4blck = c(NA,"8", NA),
bldg5blck = c(NA,"6", NA),
count = (5,5,4))
我正在考虑将dplyr 和across 与length(unique) 一起使用,但不知道如何正确运行它,尤其是如何将mutate 转换为新行的长格式。
我希望得到的结果如下所示。 (突变后不需要街道号码和名称
df<-data.frame(bldg1 = c("26 this street, big district","block8, fancy estate, small district", "11 normal lane, district", "27 this street, big district","block6, fancy estate, small district"),
bldg2 = c(NA, "block8, fancy estate, small district", "11 normal lane, district",NA,"block6, fancy estate, small district"),
bldg3 = c("26 this street, big district",NA, "11 normal lane, district", NA, NA),
bldg4 = c("26 this street, big district","block8, fancy estate, small district", NA,NA,NA),
bldg5 = c("26 this street, big district",NA, "11 normal lane, district",NA,NA),
count = ("4","3","4","1","2"))
【问题讨论】: