【问题标题】:Simulation based on set of rules in R基于 R 中的一组规则的模拟
【发布时间】:2021-07-23 19:11:43
【问题描述】:

我想运行一个随机选择行并根据一组规则将行的总值相加的模拟程序。我是模拟新手,所以不知道从哪里开始。

规则:每个 sim 卡总共选择 9 行。每个 9 的 sim 卡必须包含以下数量的“位置”:

QB:1
RB:2
WR:3
TE: 1
克:1
夏令时:1

我希望每个 sim 将组的值(WAR 列)和输出相加,以显示每个玩家的百分比,例如 WAR 最高的组的前 10%。希望这有点道理。这里的最终目标是确定哪些玩家最有可能成功。

这里以每个位置的十名顶级球员为例。

dput

    structure(list(player = c("Justin Tucker", "Harrison Butker", 
    "Wil Lutz", "Greg Zuerlein", "Matt Gay", "Brandon McManus", "Jake Elliott", 
    "Robbie Gould", "Stephen Hauschka", "Dan Bailey", "Patrick Mahomes", 
    "Lamar Jackson", "Dak Prescott", "Russell Wilson", "Kyler Murray", 
    "Deshaun Watson", "Matt Ryan", "Josh Allen", "Tom Brady", "Carson Wentz", 
    "Christian McCaffrey", "Saquon Barkley", "Ezekiel Elliott", "Alvin Kamara", 
    "Dalvin Cook", "Clyde Edwards-Helaire", "Derrick Henry", "Miles Sanders", 
    "Joe Mixon", "Josh Jacobs", "Travis Kelce", "George Kittle", 
    "Mark Andrews", "Zach Ertz", "Darren Waller", "Evan Engram", 
    "Hayden Hurst", "Tyler Higbee", "Hunter Henry", "Mike Gesicki", 
    "Michael Thomas", "Davante Adams", "Julio Jones", "Tyreek Hill", 
    "DeAndre Hopkins", "Chris Godwin", "Kenny Golladay", "Allen Robinson", 
    "DJ Moore", "Odell Beckham"), adp = c(3, 3, 2, 2, 1, 1, 1, 1, 
    1, 1, 26, 23, 12, 11, 10, 9, 5, 4, 4, 4, 66, 57, 53, 50, 45, 
    43, 41, 40, 40, 39, 29, 26, 18, 15, 10, 8, 7, 6, 4, 4, 48, 40, 
    38, 37, 36, 34, 29, 27, 27, 27), WAR = c(0.27, 0.27, 0.1, 0.23, 
    0.09, 0.19, -0.83, -0.3, -0.1, -0.62, 2.26, 1.41, 0.91, 1.7, 
    2.28, 1.74, 0.28, 2.29, 1.12, 0.06, 1.02, -0.05, 1.36, 3.57, 
    3.48, 1.04, 2.91, 1.13, 0.69, 1.49, 2.79, 0.71, 0.85, -0.22, 
    1.67, 0.07, 0.26, 0.06, 0.35, 0.64, -0.04, 2.74, 0.63, 2.35, 
    1.49, 0.49, 0.33, 1.17, 0.61, 0.28), position = c("K", "K", "K", 
    "K", "K", "K", "K", "K", "K", "K", "QB", "QB", "QB", "QB", "QB", 
    "QB", "QB", "QB", "QB", "QB", "RB", "RB", "RB", "RB", "RB", "RB", 
    "RB", "RB", "RB", "RB", "TE", "TE", "TE", "TE", "TE", "TE", "TE", 
    "TE", "TE", "TE", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR", 
    "WR", "WR")), row.names = c(NA, -50L), groups = structure(list(
    position = c("K", "QB", "RB", "TE", "WR"), .rows = structure(list(
        1:10, 11:20, 21:30, 31:40, 41:50), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -5L), class = c("tbl_df", 
    "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
    "tbl_df", "tbl", "data.frame"))

【问题讨论】:

标签: r dplyr simulation


【解决方案1】:

一个想法是您可以使用查找表来设置每组的样本数,然后通过从每个组中采样 n_samples 创建一个函数来运行“模拟”。不完全确定 WAR 的总和是什么,但一旦你进行了模拟操作,比如分组总和,应该很简单。

请注意,您的示例数据中没有“DST”位置,因此每次模拟仅出现 8 个。

library(tidyverse)

# lookup table
df_sample <- data.frame(position = c("K", "QB", "RB", "TE", "WR", "DST"),
                        n_samples =   c(1,     1,    2,   1,    3,    1))


df_nest <- df %>%
  left_join(df_sample) %>%
  group_by(position, n_samples) %>%
  nest

run_sim <- function(nested_df = df_nest){
  nested_df %>%
    mutate(sim = map2(data, n_samples, sample_n)) %>%
    ungroup() %>%
    select(-data, -n_samples) %>%
    unnest(sim)
}


map_dfr(1:10, ~run_sim(df_nest), .id = 'sim')

#----
# A tibble: 80 x 5
   sim   position player             adp   WAR
   <chr> <chr>    <chr>            <dbl> <dbl>
 1 1     K        Dan Bailey           1 -0.62
 2 1     QB       Patrick Mahomes     26  2.26
 3 1     RB       Miles Sanders       40  1.13
 4 1     RB       Joe Mixon           40  0.69
 5 1     TE       Evan Engram          8  0.07
 6 1     WR       Julio Jones         38  0.63
 7 1     WR       Michael Thomas      48 -0.04
 8 1     WR       DeAndre Hopkins     36  1.49
 9 2     K        Stephen Hauschka     1 -0.1 
10 2     QB       Russell Wilson      11  1.7 
# ... with 70 more rows

【讨论】:

  • 这太棒了!是的,对不起,我刚刚注意到 DST 实际上不在我提取的数据集中。太棒了!如果我想做大量的模拟人生,你会建议我做类似 doparllel 的事情吗?你能帮我写代码吗?
  • 查看furrr 包。它基本上是上面代码中 purrr 函数的 1:1 替换。
  • 再次,了不起。使用 4 个内核(或线程?我忘记了它是如何工作的)从 1000 的 17 秒模拟到 7 秒。你确实很有帮助!
  • 我有 6 个内核和 12 个本地处理器。我应该能够使用多少安全的“工人”? @nniloc
  • 这超出了我的驾驶室。这里有一些有趣的讨论:stackoverflow.com/q/28954991/12400385
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2023-02-07
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-05-08
  • 2011-03-31
  • 1970-01-01
相关资源
最近更新 更多