【问题标题】:How to move data from one column to another in R如何在R中将数据从一列移动到另一列
【发布时间】:2021-10-16 15:07:40
【问题描述】:

我正在尝试将数据从一列移动到另一列,因为基础表单填写不正确。

在表格中,它会询问有关家庭的信息,并询问每个成员的年龄 (AGE) 和性别 (SEX),每个家庭最多允许 5 人。但是有些用户填写了人 1,3 和 4 的信息,但没有填写人 2 的任何信息,因为他们错误地填写了人 2,划掉了详细信息并将人 2 的详细信息填入了人 3 的框等。

数据是这样的(这个数据中ref 1和5是正确的,其他都是不正确的)

df <- data.frame(
  ref = c(1, 2, 3, 4, 5, 6),
  AGE1 = c(45, 36, 26, 47, 24, NA),
  AGE2 = c(NA, 24, NA, 13, 57, 28),
  AGE3 = c(NA, NA, 35, NA, NA, 26),
  AGE4 = c(NA, NA, 15, 11, NA, NA),
  AGE5 = c(NA, 15, NA, NA, NA, NA),
  SEX1 = c("M", "F", "M", "M", "M", NA),
  SEX2 = c(NA, "M", NA, "F", "F", "F"),
  SEX3 = c(NA, NA, "M", NA, NA, "M"),
  SEX4 = c(NA, NA, "F", "F", NA, NA),
  SEX5 = c(NA, "F", NA, NA, NA, NA)
)

这是当前表格的样子 (我已将 NA 替换为 - 以方便阅读)

ref AGE1 AGE2 AGE3 AGE4 AGE5 SEX1 SEX2 SEX3 SEX4 SEX5
1 45 - - - - M - - - -
2 36 24 - - 15 F M - - F
3 26 - 35 15 - M - M F -
4 47 13 - 11 - M F - F -
5 24 57 - - - M F - - -
6 - 28 26 - - - F M - -

但我希望它看起来像这样

ref AGE1 AGE2 AGE3 AGE4 AGE5 SEX1 SEX2 SEX3 SEX4 SEX5
1 45 - - - - M - - - -
2 36 24 15 - - F M F - -
3 26 35 15 - - M M F - -
4 47 13 11 - - M F F - -
5 24 57 - - - M F - - -
6 28 26 - - - F M - - -

有没有办法使用dplyr 来纠正这个问题?如果没有,R中是否有另一种方法来纠正数据

【问题讨论】:

标签: r dplyr data-cleaning columnsorting


【解决方案1】:

这是一种使用dplyrtidyr 的方法。该方法包括将数据转为较长格式,将NA 值排序到末尾,重新编号列名,然后再次转为宽格式。

library(dplyr)
library(tidyr)

df <-  data.frame(ref, AGE1, AGE2, AGE3, AGE4, AGE5,
                  SEX1, SEX2, SEX3, SEX4, SEX5)
df %>% 
  mutate(across(starts_with("AGE"), as.character)) %>% 
  pivot_longer(2:11) %>%
  separate(name, into = c("cat", "num"), 3) %>%
  arrange(is.na(value)) %>%
  group_by(ref, cat) %>%
  mutate(num = seq_along(value)) %>%
  ungroup() %>%
  arrange(cat) %>%
  unite(name, cat, num, sep = "") %>%
  pivot_wider(id_cols = ref) %>%
  mutate(across(starts_with("AGE"), as.numeric))


# A tibble: 6 x 11
    ref  AGE1  AGE2  AGE3  AGE4  AGE5 SEX1  SEX2  SEX3  SEX4  SEX5 
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1     1    45    NA    NA    NA    NA M     NA    NA    NA    NA   
2     2    36    24    15    NA    NA F     M     F     NA    NA   
3     3    26    35    15    NA    NA M     M     F     NA    NA   
4     4    47    13    11    NA    NA M     F     F     NA    NA   
5     5    24    57    NA    NA    NA M     F     NA    NA    NA   
6     6    28    26    NA    NA    NA F     M     NA    NA    NA  

【讨论】:

  • 这对于获得与最初相同的订单也很有效。为什么需要将年龄变量转换为字符,然后再转换为数字?
  • 如果数据类型不同,枢轴会变得混乱。
【解决方案2】:

试试下面的基本代码

u1 <- reshape(
  setNames(df, sub("(\\d)", ".\\1", names(df))),
  direction = "long",
  idvar = "ref",
  varying = -1
)

u2  <- reshape(
  transform(
    u1[with(u1, order(is.na(AGE), is.na(SEX))), ],
    time = ave(time, ref, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "ref"
)

out <- u2[match(names(df),sub("\\.","",names(u2)))]

你会得到

> out
    ref AGE.1 AGE.2 AGE.3 AGE.4 AGE.5 SEX.1 SEX.2 SEX.3 SEX.4 SEX.5
1.1   1    45    NA    NA    NA    NA     M  <NA>  <NA>  <NA>  <NA>
2.1   2    36    24    15    NA    NA     F     M     F  <NA>  <NA>
3.1   3    26    35    15    NA    NA     M     M     F  <NA>  <NA>
4.1   4    47    13    11    NA    NA     M     F     F  <NA>  <NA>
5.1   5    24    57    NA    NA    NA     M     F  <NA>  <NA>  <NA>
6.2   6    28    26    NA    NA    NA     F     M  <NA>  <NA>  <NA>

数据

df <- data.frame(
  ref = c(1, 2, 3, 4, 5, 6),
  AGE1 = c(45, 36, 26, 47, 24, NA),
  AGE2 = c(NA, 24, NA, 13, 57, 28),
  AGE3 = c(NA, NA, 35, NA, NA, 26),
  AGE4 = c(NA, NA, 15, 11, NA, NA),
  AGE5 = c(NA, 15, NA, NA, NA, NA),
  SEX1 = c("M", "F", "M", "M", "M", NA),
  SEX2 = c(NA, "M", NA, "F", "F", "F"),
  SEX3 = c(NA, NA, "M", NA, NA, "M"),
  SEX4 = c(NA, NA, "F", "F", NA, NA),
  SEX5 = c(NA, "F", NA, NA, NA, NA)
)

【讨论】:

    【解决方案3】:

    这是一种使用dplyrtidyr 库的方法。

    library(dplyr)
    library(tidyr)
    
    df %>%
      pivot_longer(cols = -ref, 
                   names_to = c('.value', 'num'), 
                   names_pattern = '([A-Z]+)(\\d+)') %>%
      arrange(ref, AGE, SEX) %>%
      group_by(ref) %>%
      mutate(num = row_number()) %>%
      ungroup %>%
      pivot_wider(names_from = num, values_from = c(AGE, SEX)) 
    
    #    ref AGE_1 AGE_2 AGE_3 AGE_4 AGE_5 SEX_1 SEX_2 SEX_3 SEX_4 SEX_5
    #  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
    #1     1    45    NA    NA    NA    NA M     NA    NA    NA    NA   
    #2     2    15    24    36    NA    NA F     M     F     NA    NA   
    #3     3    15    26    35    NA    NA F     M     M     NA    NA   
    #4     4    11    13    47    NA    NA F     F     M     NA    NA   
    #5     5    24    57    NA    NA    NA M     F     NA    NA    NA   
    #6     6    26    28    NA    NA    NA M     F     NA    NA    NA    
    

    【讨论】:

    • 使用pivot_longer + pivot_wider 真是太聪明了,干杯,点赞!
    • Thomas,Ronak 感谢您的帮助。因为我想在填写表格时保持人们的顺序(即,如果我完成了 p1、p3、p4,我希望 p3 填写 p2 和 p4 填写 p3)我摆脱了安排(参考,年龄, SEX) 代码行。它似乎工作正常,但我不确定我是否遗漏了什么?实际数据的列比我在这里放的要多得多,您是否建议只选择年龄和性别列,执行此方法,然后按参考号 left_joining 回数据?
    【解决方案4】:

    这是使用包dedupewider的解决方案:

    library(dedupewider)
    
    df <- data.frame(
      ref = c(1, 2, 3, 4, 5, 6),
      AGE1 = c(45, 36, 26, 47, 24, NA),
      AGE2 = c(NA, 24, NA, 13, 57, 28),
      AGE3 = c(NA, NA, 35, NA, NA, 26),
      AGE4 = c(NA, NA, 15, 11, NA, NA),
      AGE5 = c(NA, 15, NA, NA, NA, NA),
      SEX1 = c("M", "F", "M", "M", "M", NA),
      SEX2 = c(NA, "M", NA, "F", "F", "F"),
      SEX3 = c(NA, NA, "M", NA, NA, "M"),
      SEX4 = c(NA, NA, "F", "F", NA, NA),
      SEX5 = c(NA, "F", NA, NA, NA, NA)
    )
    
    age_moved <- na_move(df, cols = names(df)[grepl("^AGE\\d$", names(df))]) # 'right' direction is by default
    
    sex_moved <- na_move(age_moved, cols = names(df)[grepl("^SEX\\d$", names(df))])
    
    sex_moved
    
    #>   ref AGE1 AGE2 AGE3 AGE4 AGE5 SEX1 SEX2 SEX3 SEX4 SEX5
    #> 1   1   45   NA   NA   NA   NA    M <NA> <NA>   NA   NA
    #> 2   2   36   24   15   NA   NA    F    M    F   NA   NA
    #> 3   3   26   35   15   NA   NA    M    M    F   NA   NA
    #> 4   4   47   13   11   NA   NA    M    F    F   NA   NA
    #> 5   5   24   57   NA   NA   NA    M    F <NA>   NA   NA
    #> 6   6   28   26   NA   NA   NA    F    M <NA>   NA   NA
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-11-27
      • 1970-01-01
      • 2021-06-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多