【问题标题】:Reshaping data using tidyr使用 tidyr 重塑数据
【发布时间】:2016-01-22 11:01:02
【问题描述】:

我正在使用一个数据框 data,它的结构与下面的类似。

  Gender   Age         Number
1 Female 55-59 years       5
2 Female   65+ years       10
3   Male 25-29 years       4
4   Male 40-44 years       3
5   Male 50-54 years       1

我正在尝试使用 tidyr 重塑数据(迄今为止未成功),以便 Number 列的每个值都在其自己的行上显示。我正在寻找的输出应该类似于以下内容:

  Gender   Age
1 Female 55-59 years  
2 Female 55-59 years
3 Female 55-59 years
4 Female 55-59 years
5 Female 55-59 years 
6 Female   65+ years
7 Female   65+ years
8 Female   65+ years
9 Female   65+ years
10 Female   65+ years
11 Female   65+ years
12 Female   65+ years
13 Female   65+ years
14 Female   65+ years
15 Female   65+ years
16 Male 25-29 years
17 Male 25-29 years
18 Male 25-29 years
19 Male 25-29 years
20 Male 40-44 years
21 Male 40-44 years
22 Male 40-44 years
23 Male 50-54 years

我尝试使用收集/传播功能的各种组合,但几乎没有成功。我相当确定这在 tidyr 中是可能的!

我知道我可以使用许多其他包/功能来实现相同的结果,但我非常渴望获得一个 tidyr 解决方案,以便我可以将它包含在更大的 dplyr/tidyr 管道中。

非常感谢任何帮助。

dat <- structure(list(Gender = structure(c(3L, 3L, 1L, 2L, 1L), .Label = c("   Male", 
    " Male", "Female"), class = "factor"), Age = structure(c(5L, 
    1L, 2L, 3L, 4L), .Label = c("65+ years", "25-29 years", "40-44 years", 
    "50-54 years", "55-59 years"), class = "factor"), Number = c(5L, 
    10L, 4L, 3L, 1L)), .Names = c("Gender", "Age", "Number"), class = "data.frame", row.names = c(NA, 
    -5L))

【问题讨论】:

  • 为什么不直接使用rep()?你可以轻松做到with(df, data.frame(Gender = rep(Gender, Number), Age = rep(Age, Number)))
  • 或者只是library(splitstackshape) ; expandRows(df, "Number")

标签: r dplyr tidyr


【解决方案1】:

这也不是用tidyr,但我觉得很自然:

dat %>% slice(rep(row_number(), Number)) %>% select(-Number)

    Gender         Age
1   Female 55-59 years
2   Female 55-59 years
3   Female 55-59 years
4   Female 55-59 years
5   Female 55-59 years
6   Female   65+ years
7   Female   65+ years
8   Female   65+ years
9   Female   65+ years
10  Female   65+ years
11  Female   65+ years
12  Female   65+ years
13  Female   65+ years
14  Female   65+ years
15  Female   65+ years
16    Male 25-29 years
17    Male 25-29 years
18    Male 25-29 years
19    Male 25-29 years
20    Male 40-44 years
21    Male 40-44 years
22    Male 40-44 years
23    Male 50-54 years

正如@bramtayl 建议的那样,可以(可以说)提高可读性

dat %>% slice(row_number() %>% rep(Number)) %>% select(-Number)

【讨论】:

  • 不是 tidyr 而是在 hadleyverse 框架内。干得好+1
  • dat %&gt;% slice(n() %&gt;% seq %&gt;% rep(Number))
  • @bramtayl 我刚刚意识到n() %&gt;% seqrow_number() 相同,感谢您的评论。我已经添加了您的更改方式并编辑了我的方式。
  • 不错的方法!我有df %&gt;% do(data_frame(Gender = rep(.$Gender, .$Number), Age = rep(.$Age, .$Number)))
【解决方案2】:

不是 tidyr,但非常快速和高效:

dat2 <- dat[rep(1:nrow(dat), dat[["Number"]]), 1:2]
rownames(dat2) <- NULL

##     Gender          Age
## 1   Female  55-59 years
## 2   Female  55-59 years
## 3   Female  55-59 years
## 4   Female  55-59 years
## 5   Female  55-59 years
## 6   Female    65+ years
## 7   Female    65+ years
## 8   Female    65+ years
## 9   Female    65+ years
## 10  Female    65+ years
## 11  Female    65+ years
## 12  Female    65+ years
## 13  Female    65+ years
## 14  Female    65+ years
## 15  Female    65+ years
## 16    Male  25-29 years
## 17    Male  25-29 years
## 18    Male  25-29 years
## 19    Male  25-29 years
## 20    Male  40-44 years
## 21    Male  40-44 years
## 22    Male  40-44 years
## 23    Male  50-54 years

【讨论】:

  • 感谢@TylerRinker - 这是一个很好的整洁解决方案。我真的很想看看是否有人能找到一个整洁的解决方案。我试图更好地理解语法以及它的可能/不可能。我认为其他人可能也会觉得这很有用...
  • @vengefulsealion - 我认为 tidyr 中没有任何函数可以根据列中的值复制行
【解决方案3】:

我们可以使用tidyr/dplyr 做到这一点。将值更改为序列 unnest 后,将“数字”转换为 list 列,并使用 select 从输出中删除“数字”列。

library(dplyr)
library(tidyr)
dat1 <- dat %>% 
          mutate(Number= lapply(Number, seq)) %>%
          unnest(Number) %>% 
          select(-Number)

请注意,输出将是tbl_df,当我们使用dplyr 函数执行其他操作时,这将很有用。

str(dat1)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame':       23 obs. of  2 variables:
#  $ Gender: Factor w/ 3 levels "   Male"," Male",..: 3 3 3 3 3 3 3 3 3 3 ...
#  $ Age   : Factor w/ 5 levels "65+ years","25-29 years",..: 5 5 5 5 5 1 1 1 1 1 ...

dat1 %>%
     as.data.frame()
#   Gender         Age
#1   Female 55-59 years
#2   Female 55-59 years
#3   Female 55-59 years
#4   Female 55-59 years
#5   Female 55-59 years
#6   Female   65+ years
#7   Female   65+ years
#8   Female   65+ years
#9   Female   65+ years
#10  Female   65+ years
#11  Female   65+ years
#12  Female   65+ years
#13  Female   65+ years
#14  Female   65+ years
#15  Female   65+ years
#16    Male 25-29 years
#17    Male 25-29 years
#18    Male 25-29 years
#19    Male 25-29 years
#20    Male 40-44 years
#21    Male 40-44 years
#22    Male 40-44 years
#23    Male 50-54 years

【讨论】:

    猜你喜欢
    • 2019-11-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-11-25
    • 1970-01-01
    相关资源
    最近更新 更多