【问题标题】:Reshaping wide dataframe to long format将宽数据帧重塑为长格式
【发布时间】:2021-05-01 16:08:14
【问题描述】:

我有一个格式如下的 df:

name other_info revenues_2015 ebitda_2015 ebitda_2016 revenues_2015 other_2017
A Info1 1 2 3 4 5
B Info2 6 7 8 9 10
C Info3 11 12 13 14 15

我想将其更改为长格式,并按以下方式构建:

姓名 |信息 |年份 |指标名称 |价值

你能告诉我如何在 R 中做到这一点吗? 由于真实数据框有300多列,有没有办法自动创建年份列?


数据:


structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1", 
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L, 
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L, 
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L, 
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L, 
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L, 
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L))

【问题讨论】:

  • 是的。您可以使用 tidyverse 包中的 pivot_longer 来做到这一点。有一个名为 names_sep 的参数,您可以在其中指定在下划线处拆分名称。

标签: r dataframe


【解决方案1】:

这对你有用吗?

library(dplyr)
library(tidyr)

structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1", 
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L, 
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L, 
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L, 
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L, 
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L, 
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L)) %>% 
  pivot_longer(revenues_2015:other_2017, names_pattern = "(.+)_(\\d{4})", names_to = c("metric", "year"))

【讨论】:

  • 这正是我想要的!谢谢!
  • 你也可以像@deschen 建议的pivot_longer(revenues_2015:other_2017, names_sep = "_", names_to = c("metric", "year")) 那样做快捷方式,但是如果你可以正则表达式,names_pattern 会更强大。
  • 是的,只要变量名中没有其他“_”就可以了
【解决方案2】:

你有两个选择,你可以使用 utils 包(base-r 函数,你不必使用 library() 调用它)或 reshape2 包中的 melt 函数。

使用函数 reshape()

 data = structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1", 
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L, 
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L, 
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L, 
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L, 
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L, 
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L))

LF_data = reshape(data=data, idvar = c("name","other_info"), varying =c("revenues_2015","ebitda_2015","ebitda_2016","revenues_2015","other_2017"), 
    v.names = c("Value"),times=c("revenues_2015","ebitda_2015","ebitda_2016","revenues_2015","other_2017"), direction = "long")

使用 package reshape2 melt() 函数:

  1. 首先你需要有一个带有属性的数据框 字符串AsFactor = False
       data=data.frame(structure(list(name = structure(1:3, .Label = c("A", "B", "C"
        ), class = "factor"), other_info = structure(1:3, .Label = c("Info1", 
        "Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L, 
        3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L, 
        3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L, 
        3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L, 
        3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L, 
        1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA, 
        -3L)),stringsAsFactors=False)

 2. Then:
LF_data=reshape2::melt(data,id.vars=c("name","other_info"), mesure.vars=c("revenues_2015","ebitda_2015","ebitda_2016","revenues_2015","other_2017"))

melt 不会让您拥有“name”、“other_info”和“variable”的组合,除非它们是唯一的。在您的示例中,它将第二个三元组收入_2015 更改为收入_2015.1

【讨论】:

  • 谢谢,这很有帮助
  • @PietroFabbro 您可以使用变量来提供名称和时间属性,因此您不必编写所有列名称。请记住从该变量中取出您想要保留为长格式列的列的名称。
【解决方案3】:

有点太晚了:类似于-mad-statter 解决方案。使用 mutate 略有不同:

library(tidyr)
library(dplyr)

df <- structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1", 
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L, 
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L, 
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L, 
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L, 
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L, 
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA, -3L)) %>% 
  pivot_longer(revenues_2015:other_2017, names_to = c("Metric name", "Year"),
               names_sep ="_", values_to = "Value") %>% 
  dplyr::mutate(Year = stringr::str_remove(Year, "\\D")) %>% 
  rename(Name=name, Info = other_info)

【讨论】:

    猜你喜欢
    • 2022-07-28
    • 1970-01-01
    • 2021-09-15
    • 1970-01-01
    • 2012-02-18
    • 2017-07-19
    • 2012-10-15
    • 2022-01-11
    相关资源
    最近更新 更多