【发布时间】:2018-12-12 07:43:32
【问题描述】:
鉴于以下示例数据:
test_data <- tibble(
FAMILY_MEMBER_TYPE = c(rep("Father", times = 2), rep("Mother", times = 2),
rep("Daugther", times = 3), rep("Son", times = 3)),
NAME = c("Fred", "Frank", "Mary", "Megan", "Diane", "Denise", "Daisy",
"Sam", "Scott", "Steve"))
如果一个家庭中只能有一个 FAMILY_MEMBER_TYPE,那么如何创建一个新的分组变量 FAMILY_NUMBER,它显示了家庭的可能组合。
即所需输出的示例(有 2 个可能的系列):
output_data <- tibble(
FAMILY_NUMBER = c(rep("FAMILY 1", 4), rep("FAMILY 2", 4)),
NAME = c("Fred", "Mary", "Diane", "Sam", "Fred", "Megan", "Diane","Sam"),
FAMILY_MEMBER_TYPE = c(rep(c("Father", "Mother", "Daughter", "Son"), 2)))
> output_data
# A tibble: 8 x 3
FAMILY_NUMBER NAME FAMILY_MEMBER
<chr> <chr> <chr>
1 FAMILY 1 Fred Father
2 FAMILY 1 Mary Mother
3 FAMILY 1 Diane Daughter
4 FAMILY 1 Sam Son
5 FAMILY 2 Fred Father
6 FAMILY 2 Megan Mother
7 FAMILY 2 Diane Daughter
8 FAMILY 2 Sam Son
编辑:我已将 test_data 更改为包含不相等数量的 FAMILY_MEMBER_TYPE,因为在实际情况中我需要将此解决方案应用于,组包含不相等数量的变量。
【问题讨论】:
-
检查
expand.grid:expand.grid(split(test_data$NAME, test_data$FAMILY_MEMBER_TYPE)) -
感谢 Henrik,这似乎可行。太糟糕了,我想用我的实际数据输出的向量是 5791818.1Gb :/