【问题标题】:Having trouble with pivoting data in R在 R 中旋转数据时遇到问题
【发布时间】:2022-12-10 12:51:01
【问题描述】:

以下是示例数据。我的目标是为每个区域/行业/所有权组合创建一行。对于此示例数据集,每个区域/行业/所有权组合将有 24 个值。我知道你必须做一系列的支点,但我的尝试没有成功。底部是所需的结果。

在我更大的数据集中,我有 3 年多的时间和一个行业,但这使它易于管理。

 naicscode <- c("111","111","111","111","111","111","111","111","111","111","111","111","111","111","111","111")
 areavalue <- c("000000","000000","000000","000000","000000","000000","000000","000000","000003","000003","000003","000003","000003","000003","000003","000003")
 ownership <- c("50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50")
 period <- c("01","02","03","04","01","02","03","04","01","02","03","04","01","02","03","04")
 periodyear <- c("2020","2020","2020","2020","2021","2021","2021","2021", "2020","2020","2020","2020", "2021","2021","2021","2021")
 mnth1emp<- c(25000,25005,25010,25020,25025,20506,20510,21555,16000,16005,16025,16020,16035,13595,14010,13985)
 mnth2emp<- c(25005,25010,25000,24995,25005,25010,25060,24995,15995,16005,16015,16020,16030,14015,14000,14200)
 mnth3emp<- c(24985,25000,25005,25010,25009,25040,25090,25080,15990,16000,16065,16025,16030,14665,14550,14620)


 test <- data.frame(naicscode,areavalue,ownership,periodyear,period,mnth1emp,mnth2emp,mnth3emp)




  naicscode       areavalue    ownership     202001     202002    202003  202004   202005   202006  ... and on until 202112. 
     111            000000        50          25000       25005     24985   25005   25010   25000

【问题讨论】:

  • 您的透视数据包含 test 中不存在的数据。 202003(例如)是periodyear+periodperiodyear+mnth3emp 的串联吗?

标签: r dplyr pivot


【解决方案1】:

我假设 period 表示季度,name 中的数字表示该季度内的月份数。

如果是这种情况,您的列标题是100*periodyear + (period-1)*4 + name 中的数字。

library(tidyverse)
test %>%
  pivot_longer(starts_with("mnth")) %>%
  mutate(period_num = as.numeric(periodyear)*100 + (as.numeric(period)-1)*4 + parse_number(name)) %>%
  select(-c(periodyear:name)) %>%
  pivot_wider(names_from = period_num, values_from = value)
  

结果

# A tibble: 2 × 27
  naicscode areavalue ownership 20200…¹ 20200…² 20200…³ 20200…⁴ 20200…⁵ 20200…⁶
  <chr>     <chr>     <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 111       000000    50          25000   25005   24985   25005   25010   25000
2 111       000003    50          16000   15995   15990   16005   16005   16000
# … with 18 more variables: `202009` <dbl>, `202010` <dbl>, `202011` <dbl>,
#   `202013` <dbl>, `202014` <dbl>, `202015` <dbl>, `202101` <dbl>,
#   `202102` <dbl>, `202103` <dbl>, `202105` <dbl>, `202106` <dbl>,
#   `202107` <dbl>, `202109` <dbl>, `202110` <dbl>, `202111` <dbl>,
#   `202113` <dbl>, `202114` <dbl>, `202115` <dbl>, and abbreviated variable
#   names ¹​`202001`, ²​`202002`, ³​`202003`, ⁴​`202005`, ⁵​`202006`, ⁶​`202007`
# ℹ Use `colnames()` to see all variable names

【讨论】:

    猜你喜欢
    • 2014-08-22
    • 1970-01-01
    • 2018-09-10
    • 1970-01-01
    • 1970-01-01
    • 2021-11-17
    • 2020-09-16
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多