【问题标题】:Unnest and Pivot Longer with Duplicate Columns使用重复的列取消嵌套和旋转更长的时间
【发布时间】:2019-10-02 16:26:14
【问题描述】:

我有一个嵌套的df,我正在尝试清理它。

Sample Data:
df <- 
  tibble::tribble(
  ~idTeam, ~ptsTotalBehindFirst, ~ptsOverall, ~ptsDiffLastPeriod, ~rankOverall, ~ptsBattingBehindFirst, ~ptsBatting, ~ptsDiffBattingLastPeriod,                                                                                                                                                                            ~dataBatting, ~rankBatting, ~ptsPitchingBehindFirst, ~ptsPitching, ~ptsDiffPitchingLastPeriod,                                                                                                                                                                                                   ~dataPitching, ~rankPitching,
      "2",                  "0",       "111",               "-4",           1L,                    "0",        "65",                       "0", list(abbr = c("OBP", "HR", "RBI", "R", "SB"), roto_points = c(13, 13, 13, 13, 13), value = c(0.3663, 384, 1012, 1102, 164), diff = c(0, 0, 0, 0, 0), rank = c(1, 1, 1, 1, 1)),           1L,                     "5",         "46",                       "-4",                            list(abbr = c("S", "W", "K", "ERA", "WHIP"), roto_points = c(12, 6, 11, 8, 9), value = c(94, 89, 1576, 3.946, 1.2179), diff = c(0, -2, -2, 0, 0), rank = c(2, 8, 3, 6, 5)),            3L,
      "8",               "13.5",      "97.5",                "2",           2L,                   "13",        "52",                       "0",    list(abbr = c("OBP", "HR", "RBI", "R", "SB"), roto_points = c(12, 11, 11, 12, 6), value = c(0.3576, 323, 954, 1011, 89), diff = c(0, 0, 0, 0, 0), rank = c(2, 3, 3, 2, 8)),           3L,                   "5.5",       "45.5",                        "2", list(abbr = c("S", "W", "K", "ERA", "WHIP"), roto_points = c(2, 7.5, 10, "13", 13), value = c(56, 91, 1508, 3.688, 1.1474), diff = c(-1, 1.5, 0.5, 1, 0), rank = c(12, 6, 4, 1, 1)),            4L
  )

我试图unnest 的数据存储在dataBattingdataPitching 列中。我正在尝试unnest 两列中的所有列并将结果绑定为行。类似于pivot_longer 的东西,但我不确定将 4 个重复的列嵌套在 2 个单独的列中的正确方法。

我的尝试是:

  df %>% 
  unnest_wider(dataBatting) %>% 
  unnest(c(abbr, roto_points, value, diff, rank)) %>% 
  unnest_wider(dataPitching) %>% 
  unnest(c(abbr, roto_points, value, diff, rank))


Error is:
Error: Column names `abbr`, `roto_points`, `value`, `diff`, `rank` must not be duplicated.
Use .name_repair to specify repair.
Call `rlang::last_error()` to see a backtrace

我的问题是我想绑定 dataPitching 中与 dataBatting 具有相同列名的相同列(abbr、roto_points、value、diff、rank)。

我还想更改重复列的名称。 tidyr::hoist 是更好的方法吗?

想要的df:

tibble::tribble(
                                              ~idTeam, ~ptsTotalBehindFirst, ~ptsOverall, ~ptsDiffLastPeriod, ~rankOverall, ~ptsBattingBehindFirst, ~ptsBatting, ~ptsDiffBattingLastPeriod,  ~abbr, ~roto_points5, ~value, ~diff, ~rank, ~rankPitching, ~ptsPitchingBehindFirst, ~ptsPitching, ~ptsDiffPitchingLastPeriod,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,  "OBP",            13, 0.3663,     0,     1,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,   "HR",            13,    384,     0,     1,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,  "RBI",            13,   1012,     0,     1,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,    "R",            13,   1102,     0,     1,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,   "SB",            13,    164,     0,     1,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,    "S",            12,     94,     0,     2,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,    "W",             6,     89,    -2,     8,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,    "K",            11,   1576,    -2,     3,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,  "ERA",             8,  3.946,     0,     6,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0, "WHIP",             9, 1.2179,     0,     5,             3,                       5,           46,                         -4,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,  "OBP",            12, 0.3576,     0,     2,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,   "HR",            11,    323,     0,     3,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,  "RBI",            11,    954,     0,     3,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,    "R",            12,   1011,     0,     2,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,   "SB",             6,     89,     0,     8,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,    "S",             2,     56,    -1,    12,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,    "W",           7.5,     91,   1.5,     6,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,    "K",            10,   1508,   0.5,     4,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,  "ERA",            13,  3.688,     1,     1,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0, "WHIP",            13, 1.1474,     0,     1,             4,                     5.5,         45.5,                          2
                                              )

【问题讨论】:

  • 无法通过您展示的示例重现错误。 df %&gt;% unnest(c(dataBatting, dataPitching))# # A tibble: 10 x 15
  • 修复了这个问题,因为我意识到我的例子不清楚。
  • 它仍然可以正常工作df %&gt;% + unnest_wider(dataBatting) %&gt;% unnest(c(abbr, roto_points, value, diff, rank)) # A tibble: 10 x 19 没有收到任何错误
  • 我添加了更多信息以使其更清晰。对此感到抱歉。
  • 这个问题也和no common type有关

标签: r tidyr


【解决方案1】:

一个选项是循环遍历 'dataBatting'、'dataPitching' 列名,分别执行 unnest_widerunnest 感兴趣的列,并将行绑定在一起(map_dfr - 后缀 'dfr' 返回数据框的行从listdata.framestibbles 绑定在一起)。应该注意的一件事是,许多 tidyverse 函数都是类型敏感的。在这里,我们发现一些list 元素具有不同的类型,除非提到'ptype',否则这在unnest 中会出现问题。为了避免这种情况,我们可以使用type.convert根据值自动更改类型,然后执行unnesting

library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
map_dfr(c('dataBatting', 'dataPitching'), ~ 
         df %>% 
           unnest_wider(.x) %>%
           mutate_at(vars(c(abbr, roto_points, value, diff, rank)), 
                    type.convert) %>% 
           unnest(c(abbr, roto_points, value, diff, rank)) %>% 
           mutate_if(is.factor, as.character) %>%
           select(-one_of(c("dataBatting", "dataPitching")))) 

【讨论】:

  • 这行得通。谢谢! mutate 中的 type.convert 有什么作用?
  • @Jazzmatazz。如果您注意到一些 list 元素,它们有不同的 types roto_points 导致问题,而 unnesting 否则您需要指定 ptype 参数。我不想这样做,因为它会更加手动
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2012-08-05
  • 1970-01-01
  • 2021-06-04
  • 2021-12-27
  • 1970-01-01
  • 2020-06-12
  • 2021-05-08
相关资源
最近更新 更多