【问题标题】:Convert irregular nested list into dataframe将不规则嵌套列表转换为数据框
【发布时间】:2019-03-20 16:30:13
【问题描述】:

我有一个嵌套列表如下:

    mylist <- list(
      list(
        id = 1234,
        attributes = list(
             list(
               typeId = 11,
               type = 'Main',
               date = '2018-01-01', 
               attributes= list(
                 list(
                   team = 'team1',
                   values = list(
                     value1 = 1, 
                     value2 = 999)),
                 list(
                   team = 'team2',
                   values = list(
                     value1 = 2, 
                     value2 = 888))
                 )
               ),
             list(
               typeId = 12,
               type = 'Extra',
               date = '2018-01-02', 
               attributes= list(
                 list(
                   team = 'team1',
                   values = list(
                     value1 = 3, 
                     value2 = 1234)),
                 list(
                   team = 'team2',
                   values = list(
                     value1 = 4, 
                     value2 = 9876))
               )
             )
          )
        )
      )

我想将其转换为一个数据框,其中每个子条目与其所有父条目一起排成一行。所以我最终会得到一个看起来像

的数据框
    id type_id  type       date  team value1 value2
1 1234      11  Main 2018-08-01 team1      1    999
2 1234      11  Main 2018-08-01 team2      2    888
3 1234      12 Extra 2018-08-02 team1      3   1234
4 1234      12 Extra 2018-08-02 team2      4   9876

我并不总是知道列表中的名称,因此需要一种通用的方法来执行此操作而不指定列名

编辑

我对我最初的问题有一个答案,但在回应 Parfaits 评论“如果您发布原始 JSON 和您的 R 导入代码,可能会提供更简单的解决方案”。

我使用 R 代码从 url 获取我的原始 JSON:

httr::GET(
    feed_url,
    authenticate(username, password)
  ) %>%
    httr::content()

在 url 中 JSON 看起来像:

[{"id":[1234],"attributes":[{"typeId":[11],"type":["Main"],"date":["2018-01-01"],"attributes":[{"team":["team1"],"values":{"value1":[1],"value2":[999]}},{"team":["team2"],"values":{"value1":[2],"value2":[888]}}]},{"typeId":[12],"type":["Extra"],"date":["2018-01-02"],"attributes":[{"team":["team1"],"values":{"value1":[3],"value2":[1234]}},{"team":["team2"],"values":{"value1":[4],"value2":[9876]}}]}]}]

【问题讨论】:

  • 你是怎么得到这么深嵌套的列表的?您是导入 JSON 还是 XML?我们可以帮助完成这一步。
  • 从 JSON 导入。现在有一个转换列表的功能,虽然谢谢,请参阅下面的答案
  • 如果您发布原始 JSON 和您的 R 导入代码,可能会有更简单的解决方案。

标签: r list dataframe


【解决方案1】:

现在有一个功能可以做到这一点:

flattenList <- function(input) {

    output <- NULL

    ## Check which elements of the current list are also lists.
    isList <- sapply(input, class) == "list"

    ## Any non-list elements are added to the output data frame.
    if (any(!isList)) {

        ## Determine the number of rows in the output.
        maxRows <- max(sapply(input[!isList], length))

        output <-
            ## Initialise the output data frame with a dummy variable.
            data.frame(dummy = rep(NA, maxRows)) %>%

            ## Append the new columns.
            add_column(!!! input[!isList]) %>%

            ## Delete the dummy variable.
            select(- dummy)
    }

    ## If some elemenets of the current list are also lists, we apply the function again.
    if (any(isList)) {

        ## Apply the function to every sub-list, then bind the new output as rows.
        newOutput <- lapply(input[isList], flattenList) %>% bind_rows()

        ## Check if the current output is NULL.
        if (is.null(output)) {

            output <- newOutput

        } else {

            ## If the current output has fewer rows than the new output, we recycle it.
            if (nrow(output) < nrow(newOutput)) {
                output <- slice(output, rep(1:n(), times = nrow(newOutput) / n()))
            }


            ## Append the columns of the new output.
            output <- add_column(output, !!! newOutput)
        }
    }

    return(output)
}

> flattenList(mylist)
    id typeId  type       date  team priority value1 value2
1 1234     11  Main 2018-01-01 team1        1      1    999
2 1234     11  Main 2018-01-01 team2        1      2    888
3 1234     12 Extra 2018-01-02 team1        1      3   1234
4 1234     12 Extra 2018-01-02 team2        1      4   9876

【讨论】:

    猜你喜欢
    • 2020-06-22
    • 1970-01-01
    • 1970-01-01
    • 2013-02-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多