【问题标题】:Pivot longer with multiple variables in columns使用列中的多个变量旋转更长的时间
【发布时间】:2020-09-19 20:27:12
【问题描述】:

我的数据如下所示:

# A tibble: 120 x 5
     age death_rate_male life_exp_male death_rate_fem life_exp_fem
   <dbl>           <dbl>         <dbl>          <dbl>        <dbl>
 1     0        0.00630           76.0       0.00523          81.0
 2     1        0.000426          75.4       0.000342         80.4
 3     2        0.00029           74.5       0.000209         79.4
 4     3        0.000229          73.5       0.000162         78.4
 5     4        0.000162          72.5       0.000143         77.4
 6     5        0.000146          71.5       0.000125         76.5
 7     6        0.000136          70.5       0.000113         75.5
 8     7        0.000127          69.6       0.000104         74.5
 9     8        0.000115          68.6       0.000097         73.5
10     9        0.000103          67.6       0.000093         72.5
# ... with 110 more rows
> 

我正在尝试创建一个整洁的表格,其中变量是年龄、性别、预期寿命和死亡率。

我设法通过将数据框分成两个(一个包含预期寿命,另一个包含死亡率),用pivot_longer() 整理两者,然后附加两个表来做到这一点。 有没有办法用一个pivot_longer() 命令更优雅地做到这一点?提前谢谢你。

【问题讨论】:

标签: r pivot data-wrangling


【解决方案1】:

我们可以使用names_pattern(我们根据模式捕获为一个组)

library(dplyr)
library(tidyr)
df1 %>%
   pivot_longer(cols = -age, names_to = c( '.value', 'grp'), 
         names_pattern = "^(\\w+_\\w+)_(\\w+)")
# A tibble: 20 x 4
#     age grp   death_rate life_exp
#   <int> <chr>      <dbl>    <dbl>
# 1     0 male    0.0063       76  
# 2     0 fem     0.00523      81  
# 3     1 male    0.000426     75.4
# 4     1 fem     0.000342     80.4
# 5     2 male    0.00029      74.5
# 6     2 fem     0.000209     79.4
# 7     3 male    0.000229     73.5
# 8     3 fem     0.000162     78.4
# 9     4 male    0.000162     72.5
#10     4 fem     0.000143     77.4
#11     5 male    0.000146     71.5
#12     5 fem     0.000125     76.5
#13     6 male    0.000136     70.5
#14     6 fem     0.000113     75.5
#15     7 male    0.000127     69.6
#16     7 fem     0.000104     74.5
#17     8 male    0.000115     68.6
#18     8 fem     0.000097     73.5
#19     9 male    0.000103     67.6
#20     9 fem     0.000093     72.5

names_sep(在此处指定模式,它是下划线,后跟没有下划线的字符,直到最后)

df1 %>%
   pivot_longer(cols = -age, names_to = c( '.value', 'grp'), 
        names_sep = "_(?=[^_]+$)")

数据

df1 <- structure(list(age = 0:9, death_rate_male = c(0.0063, 0.000426, 
0.00029, 0.000229, 0.000162, 0.000146, 0.000136, 0.000127, 0.000115, 
0.000103), life_exp_male = c(76, 75.4, 74.5, 73.5, 72.5, 71.5, 
70.5, 69.6, 68.6, 67.6), death_rate_fem = c(0.00523, 0.000342, 
0.000209, 0.000162, 0.000143, 0.000125, 0.000113, 0.000104, 9.7e-05, 
9.3e-05), life_exp_fem = c(81, 80.4, 79.4, 78.4, 77.4, 76.5, 
75.5, 74.5, 73.5, 72.5)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

【讨论】:

  • 图书馆声明应该是 tidyr。
【解决方案2】:

借用 akrun 的数据,这里是使用 reshape 的基本 R 选项

reshape(
  setNames(df, gsub("(.*)_(\\w+)", "\\1\\.\\2", names(df))),
  direction = "long",
  varying = -1
)

这样

        age time death_rate life_exp id
1.male    0 male   0.006300     76.0  1
2.male    1 male   0.000426     75.4  2
3.male    2 male   0.000290     74.5  3
4.male    3 male   0.000229     73.5  4
5.male    4 male   0.000162     72.5  5
6.male    5 male   0.000146     71.5  6
7.male    6 male   0.000136     70.5  7
8.male    7 male   0.000127     69.6  8
9.male    8 male   0.000115     68.6  9
10.male   9 male   0.000103     67.6 10
1.fem     0  fem   0.005230     81.0  1
2.fem     1  fem   0.000342     80.4  2
3.fem     2  fem   0.000209     79.4  3
4.fem     3  fem   0.000162     78.4  4
5.fem     4  fem   0.000143     77.4  5
6.fem     5  fem   0.000125     76.5  6
7.fem     6  fem   0.000113     75.5  7
8.fem     7  fem   0.000104     74.5  8
9.fem     8  fem   0.000097     73.5  9
10.fem    9  fem   0.000093     72.5 10

【讨论】:

    猜你喜欢
    • 2021-06-04
    • 2020-06-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-12-27
    • 1970-01-01
    • 1970-01-01
    • 2021-10-29
    相关资源
    最近更新 更多