【发布时间】:2021-08-11 05:23:41
【问题描述】:
目标:根据另一个数据帧中的“键”更改一个数据帧中的一列 NA(类似于 VLookUp,仅在 R 中除外)
在这里给定 df1(为了简单起见,我只有 6 行。我拥有的关键是 50 行,用于 50 个状态):
| Index | State_Name | Abbreviation |
|---|---|---|
| 1 | California | CA |
| 2 | Maryland | MD |
| 3 | New York | NY |
| 4 | Texas | TX |
| 5 | Virginia | VA |
| 6 | Washington | WA |
这里给出了 df2(这只是一个例子。我正在使用的真实数据框有更多行):
| Index | State | Article |
|---|---|---|
| 1 | NA | Texas governor, Abbott, signs new abortion bill |
| 2 | NA | Effort to recall California governor Newsome loses steam |
| 3 | NA | New York governor, Cuomo, accused of manipulating Covid-19 nursing home data |
| 4 | NA | Hogan (Maryland, R) announces plans to lift statewide Covid restrictions |
| 5 | NA | DC statehood unlikely as Manchin opposes |
| 6 | NA | Amazon HQ2 causing housing prices to soar in northern Virginia |
任务:创建一个循环并读取每个 df2$Article 行中的状态的 R 函数;然后将其与 df1$State_Name 交叉引用,以根据 df2$Article 中的状态用相应的 df1$Abbreviation 键替换 df2$State 中的 NA。我知道这很拗口。我不知道如何开始和完成这个难题。硬编码不是一个选项,因为我有数千行这样的真实数据表,并且会随着我们向文本抓取添加更多文章而更新。
输出应如下所示:
| Index | State | Article |
|---|---|---|
| 1 | TX | Texas governor, Abbott, signs new abortion bill |
| 2 | CA | Effort to recall California governor Newsome loses steam |
| 3 | NY | New York governor, Cuomo, accused of manipulating Covid-19 nursing home data |
| 4 | MD | Hogan (Maryland, R) announces plans to lift statewide Covid restrictions |
| 5 | NA | DC statehood unlikely as Manchin opposes |
| 6 | VA | Amazon HQ2 causing housing prices to soar in northern Virginia |
注意:带有 DC 的第五个条目应为 NA。
非常感谢任何指向指南的链接和/或有关如何编码的任何建议。谢谢!
【问题讨论】:
标签: r dataframe dplyr na string-matching