【发布时间】:2020-11-10 09:24:41
【问题描述】:
R 中有两个数据框。一个包含每个人和他们居住的区域的一行。例如
df1 = data.frame(Person_ID = seq(1,10,1), Area = c("A","A","A","B","B","C","D","A","D","C"))
另一个数据框包含每个Area 的人口统计信息。
例如性别df2 = data.frame(Area = c("A","A","B","B","C","C","D","D"), gender = c("M","F","M","F","M","F","M","F"), probability = c(0.4,0.6,0.55,0.45,0.6,0.4,0.5,0.5))
在df1 中,我想创建一个性别列,其中对于df1 的每一行,我从df2 的适当子集中抽取一个性别。
例如,对于 df1 的第 1 行,我将从 df2 %>% filter(Area == "A") 中抽取性别
问题是如何在没有 for 循环的情况下对所有行执行此操作,因为实际上df1 最多可以有 500 万行?
【问题讨论】:
-
您还想在采样时包含
probability? -
是的,好点子 - 我确实想将样本基于概率。