【问题标题】:R - subset Dataframe into all possible combinations with contraintsR - 将 Dataframe 子集到所有可能的约束组合中
【发布时间】:2020-05-17 23:54:57
【问题描述】:

我有以下数据框:

Person     City     Ethnicity
A            1          2
B            2          3
C            3          3
D            1          1
E            2          1 
F            3          1
G            2          2
H            1          1
I            2          2 
J            1          2
K            1          3 
L            1          3
M            2          2

我想要一个包含 6 个人的所有可能组合的 df,以便满足以下约束:

  • 每个组不能包含同一个人两次
  • 每个组都包含所有城市
  • 每组至少有 3 名来自 1 号城市的人
  • 每个群体都有所有种族

有没有办法在 R 中做到这一点?

谢谢


数据

structure(list(Person = structure(1:13, .Label = c("A", "B", 
"C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M"), class = "factor"), 
    City = c(1L, 2L, 3L, 1L, 2L, 3L, 2L, 1L, 2L, 1L, 1L, 1L, 
    2L), Ethnicity = c(2L, 3L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 
    3L, 3L, 2L)), class = "data.frame", row.names = c(NA, -13L
))

可能的组合是 A,B,C,D,E,H。

【问题讨论】:

  • 嗨 Pietro - 你能发布一个预期的输出表吗?此外,如果您添加 dput(your-data-frame) 的输出,我们可以轻松地将您的 df 复制粘贴到 R
  • Pietro 您能否澄清一下 - 您想要四个人的组,其中三个需要来自城市 1,但所有 3 个城市必须在每个组中。这显然是不可能的,你能解释一下吗?
  • 感谢@AllanCameron 添加结构。问题中有一个错误。小组应由 6 人组成。
  • @PabloHerrerosCantis 我想要的是一个有 6 行(每个人一个)和尽可能多的列组合的数据框。

标签: r combinations


【解决方案1】:

您可以尝试使用combn 生成所有组合,然后使用一些谓词函数来过滤掉您想要的组合,如下所示:

# Data
data <- structure(list(
  Person = structure(1:13, .Label = c(
    "A", "B",
    "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M"
  ), class = "factor"),
  City = c(
    1L, 2L, 3L, 1L, 2L, 3L, 2L, 1L, 2L, 1L, 1L, 1L,
    2L
  ), Ethnicity = c(
    2L, 3L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 2L,
    3L, 3L, 2L
  )
), class = "data.frame", row.names = c(NA, -13L))


# Helpers
has_all_cities <- function(x, data) {
  all_cities <- unique(data$City)
  setequal(data[x, ]$City, all_cities)
}

has_ppl_from_city_one <- function(x, data) {
  num_ppl_from_city_one <- data[x, ]$City == 1
  sum(num_ppl_from_city_one) >= 3  # three or more
}

has_all_ethnicity <- function(x, data) {
  all_ethnicities <- unique(data$Ethnicity)
  setequal(data[x, ]$Ethnicity, all_ethnicities)
}

satisfy_all_constraints <- function(x, data) {
    has_all_cities(x, data) && 
        has_ppl_from_city_one(x, data) &&
        has_all_ethnicity(x, data)
} 


# Main
row.names(data) <- data$Person

y <- combn(data$Person, m = 6)
dim(y)

ind <- apply(y, 2, satisfy_all_constraints, data = data)
res <- y[, ind]
res[, 1:6]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] A    A    A    A    A    A   
# [2,] B    B    B    B    B    B   
# [3,] C    C    C    C    C    C   
# [4,] D    D    D    D    D    D   
# [5,] E    E    E    E    F    F   
# [6,] H    J    K    L    H    J   
# Levels: A B C D E F G H I J K L M
ncol(res)
# 574

# Check requirements
data[res[, 1], ]
#    Person City Ethnicity
# A      A    1         2
# B      B    2         3
# C      C    3         3
# D      D    1         1
# E      E    2         1
# H      H    1         1

# No duplicate person
# Has all cities: 1, 2, 3 
# Has all ethnicity: 1, 2, 3
# Has at least 3 people from city 1


# Convert into data.frame
df <- as.data.frame(structure(as.character(res), dim = dim(res)))
df[, 1:6]

【讨论】:

  • 谢谢! @杰克逊
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2014-08-01
  • 1970-01-01
  • 1970-01-01
  • 2012-03-27
  • 1970-01-01
相关资源
最近更新 更多