【发布时间】:2021-07-27 03:15:56
【问题描述】:
我有一个家庭成员数据框,其中包含 3 个整数列,“hid”、“sub”和“age”。我想在数据框中创建一个新的逻辑变量,名为“hh”,代表户主,定义如下:
- 如果家庭中只有 1 个成员,则值为 TRUE,
- 如果家庭中有 2 名或更多成员,则户主为 18 至 65 岁(含)且在 18 至 65 岁之间具有最小主体 ID(“子”)的人。
- 如果家庭中没有 18 至 65 岁的成员,则户主是主体 ID 最小的人。
每个家庭必须有 1 个且只有 1 个户主。
我的数据如下所示:
# A tibble: 10 x 3
hid sub age
<dbl> <dbl> <dbl>
1 1 1 75
2 1 2 55
3 2 1 35
4 3 1 69
5 3 2 72
6 4 1 69
7 5 1 15
8 5 2 17
9 5 3 42
10 6 1 72
我希望结果是这样的:
> result
# A tibble: 10 x 4
hid sub age hh
<dbl> <dbl> <dbl> <lgl>
1 1 1 75 FALSE # Not 18-65 & there is another aged 18-65 within this household.
2 1 2 55 TRUE # Aged 18-65 and the smallest sub id within this household.
3 2 1 35 TRUE # Only 1 in this household.
4 3 1 69 TRUE # Not aged 18-65, but no other member is and smallest sub id.
5 3 2 72 FALSE # Not aged 18-65, and not the smallest sub id.
6 4 1 69 TRUE # Only 1 in this household.
7 5 1 15 FALSE # Not aged 18-65 and others in this household qualify.
8 5 2 17 FALSE # Not aged 18-65 and others in this household qualify.
9 5 3 42 TRUE # Aged 18-65 and the smallest sub id among those aged 18-65 within this household.
10 5 4 62 FALSE # Aged 18-65 but not the smallest sub id among those aged 18-65 within this household.
谢谢!
d <- structure(list(hid = c(1, 1, 2, 3, 3, 4, 5, 5, 5, 5),
sub = c(1, 2, 1, 1, 2, 1, 1, 2, 3, 4),
age = c(75, 55, 35, 69, 72, 69, 15, 17, 42, 62)),
row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
【问题讨论】:
-
您尝试过哪些不起作用的方法?对于像这样的逻辑可以复杂的事情,手动绘制代码的表示(例如决策树)以将其分开会很有帮助。制作临时变量以跟踪不同的条件也很有帮助,例如家庭中的人数,是 18-65 岁的人等,而不是试图将所有逻辑合并为一个步骤
标签: r dplyr data-manipulation data-management