【发布时间】:2021-05-04 00:21:38
【问题描述】:
我有一个如下所示的数据框“city_table”:
+---+---------------------+
| | city |
+---+---------------------+
| 1 | Chicago-2234dxsw |
+---+---------------------+
| 2 | Chicago,IL |
+---+---------------------+
| 3 | Chicago |
+---+---------------------+
| 4 | Chicago - 124421xsd |
+---+---------------------+
| 5 | Chicago_2133xx |
+---+---------------------+
| 6 | Atlanta- 1234xx |
+---+---------------------+
| 7 | Atlanta, GA |
+---+---------------------+
| 8 | Atlanta - 123456T |
+---+---------------------+
我有另一个城市代码查找表“city_lookup”,如下所示:
+---+--------------+-----------+
| | city_name | city_code |
+---+--------------+-----------+
| 1 | Chicago, IL | 001 |
+---+--------------+-----------+
| 2 | Atlanta, GA | 002 |
+---+--------------+-----------+
如您所见,“city”中的城市名称混乱且格式不同,而“city_code”中的城市名称遵循统一格式(city,STATE)。
我希望决赛桌通过匹配city_table$city 与city_lookup$city_name 之间的前n 个字符(让我们看看,n=7),返回城市代码 正确的,像这样:
+---+---------------------+-----------+
| | city_name | city_code |
+---+---------------------+-----------+
| 1 | Chicago-2234dxsw | 001 |
+---+---------------------+-----------+
| 2 | Chicago,IL | 001 |
+---+---------------------+-----------+
| 3 | Chicago | 001 |
+---+---------------------+-----------+
| 4 | Chicago - 124421xsd | 001 |
+---+---------------------+-----------+
| 5 | Chicago_2133xx | 001 |
+---+---------------------+-----------+
| 6 | Atlanta- 1234xx | 002 |
+---+---------------------+-----------+
| 7 | Atlanta, GA | 002 |
+---+---------------------+-----------+
| 8 | Atlanta - 123456T | 002 |
+---+---------------------+-----------+
我在 R 中执行此操作,最好使用 tidyverse/dplyr。非常感谢您的帮助!
【问题讨论】: