【发布时间】:2018-01-27 22:15:42
【问题描述】:
我对 R 很陌生,没有找到解决问题的方法。我真的希望你能帮助我。
虽然有更多的列和观察结果,但我的数据框如下所示:
dt <- data.frame(hid = c(1, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4),
syear = c(2000, 2001, 2003, 2003, 2003, 2000, 2000, 2001, 2001, 2002, 2002),
employlvl = c("Full-time", "Part-time", "Part-time", "Unemployed", "Unemployed",
"Full-time", "Full-time", "Full-time", "Unemployed", "Part-time",
"Full-time"),
relhead = c("Head", "Head", "Head", "Partner", "other", "Head",
"Partner", "Head", "Partner", "Head", "Partner"))
| hid | syear | employlvl | relhead |
|-----|-------|-------------|-----------------------|
| 1 | 2000 | Full-time | Head |
| 2 | 2001 | Part-time | Head |
| 2 | 2003 | Part-time | Head |
| 2 | 2003 | Unemployed | Partner |
| 2 | 2003 | Unemployed | other |
| 4 | 2000 | Full-time | Head |
| 4 | 2000 | Full-time | Partner |
| 4 | 2001 | Full-time | Head |
| 4 | 2001 | Unemployed | Partner |
| 4 | 2002 | Part-time | Head |
| 4 | 2002 | Full-time | Partner |
我想再创建一个列来表示合作伙伴的就业水平,并希望得到以下输出:
| hid | syear | employlvl | relhead | Partner |
|-----|-------|-------------|-----------------------|-------------------|
| 1 | 2000 | Part-time | Head | NA |
| 2 | 2001 | Part-time | Head | NA |
| 2 | 2003 | Part-time | Head | Unemployed |
| 2 | 2003 | Unemployed | Partner | NA |
| 2 | 2003 | Unemployed | other | NA |
| 4 | 2000 | Full-time | Head | Full-time |
| 4 | 2000 | Full-time | Partner | NA |
| 4 | 2001 | Full-time | Head | Unemployed |
| 4 | 2001 | Unemployed | Partner | NA |
| 4 | 2002 | Part-time | Head | Full-time |
| 4 | 2002 | Full-time | Partner | NA |
目前我正在使用以下代码。 (再次感谢用户 ycw)
library(dplyr)
library(tidyr)
dt2 <- dt %>%
group_by(hid, syear) %>%
filter(n() > 1) %>%
filter(`relhead` != "Child") %>%
spread(relhead, employlvl) %>%
mutate(Relation = "Head") %>%
rename(`Employment Partner` = Partner) %>%
select(-Head)
dt3 <- dt %>%
left_join(dt2, by = c("hid", "syear", "relhead" = "Relation"))
对于这个小数据集,代码工作得非常好。但是,一旦我尝试获取全部数据,我就会得到以下信息:
Error: Data source must be a dictionary
非常感谢您的帮助。
【问题讨论】:
-
如果你逐行运行代码,你能确定是哪一行触发了错误吗?由于该错误在小型数据集中不可复制,因此我们需要更多有关其发生位置的信息。
-
小示例数据集和实际数据集之间一定存在差异。请仔细检查这两个数据集,看看是否有任何差异。您还可以使用 dput 函数共享全部或部分实际数据集,并在此处发布以供其他人查看。
-
不幸的是,我正在处理敏感数据,并且不允许共享它们。但我认为问题在于我的列比上面的数据框多。如果我在示例数据框中添加另一列,我不会得到所需的输出。错误发生在代码的最后一行 (select(-Head))。
标签: r error-handling dplyr