【问题标题】:Removing incomplete cases from output of tidyr - gather() - r从 tidyr -gather() - r 的输出中删除不完整的案例
【发布时间】:2014-07-30 19:27:12
【问题描述】:

我在一个看起来像这样的数据框中有不整齐的数据。

在这里,您可以在“团队”中看到一些足球队的名称。 Name1-3 是变量,在第一列中列出了用于指代这些团队的不同名称。

               team             name1        name2      name3
1      Loughborough      Loughborough                        
2        Luton Town        Luton Town        Luton           
3      Macclesfield      Macclesfield                        
4  Maidstone United  Maidstone United                        
5   Manchester City   Manchester City     Man City           
6 Manchester United Manchester United Newton Heath Man United
7    Mansfield Town    Mansfield Town    Mansfield           
8      Merthyr Town      Merthyr Town                        

我的目标是将数据放入 2 列中,其中包含 team-name1、team-name2、team-name3 配对。我只想保留那些在 name1、name2 或 name3 中有数据的配对。

为此,我正在尝试 tidyr's-gather()

temp <- dat %>% gather(key, value, 2:4) 
temp$key<-NULL
temp

这给出了以下输出:

                team             value
1       Loughborough      Loughborough
2         Luton Town        Luton Town
3       Macclesfield      Macclesfield
4   Maidstone United  Maidstone United
5    Manchester City   Manchester City
6  Manchester United Manchester United
7     Mansfield Town    Mansfield Town
8       Merthyr Town      Merthyr Town
9       Loughborough                  
10        Luton Town             Luton
11      Macclesfield                  
12  Maidstone United                  
13   Manchester City          Man City
14 Manchester United      Newton Heath
15    Mansfield Town         Mansfield
16      Merthyr Town                  
17      Loughborough                  
18        Luton Town                  
19      Macclesfield                  
20  Maidstone United                  
21   Manchester City                  
22 Manchester United        Man United
23    Mansfield Town                  
24      Merthyr Town                  

我尝试删除不完整的案例(例如第 20,21、23,24 行但不是第 22 行),使用:

temp[complete.cases(temp),]

这不起作用,因为看似空的值观察包含一个字符“”-我猜这就是gather() 返回缺失数据的方式?我尝试将temp$value 转换为因子,但这也不起作用。

我很想听听如何摆脱不完整的案例。

样本数据...

dat<-structure(list(team = structure(1:8, .Label = c("Loughborough", 
"Luton Town", "Macclesfield", "Maidstone United", "Manchester City", 
"Manchester United", "Mansfield Town", "Merthyr Town"), class = "factor"), 
    name1 = structure(1:8, .Label = c("Loughborough", "Luton Town", 
    "Macclesfield", "Maidstone United", "Manchester City", "Manchester United", 
    "Mansfield Town", "Merthyr Town"), class = "factor"), name2 = structure(c(1L, 
    2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("", "Luton", "Man City", 
    "Mansfield", "Newton Heath"), class = "factor"), name3 = structure(c(1L, 
    1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "Man United"), class = "factor")), .Names = c("team", 
"name1", "name2", "name3"), row.names = c(NA, -8L), class = "data.frame")

【问题讨论】:

  • 如果您的空白是 NA,您可以利用 gather 中的 na.rm 参数。当您使用 read.table 中的参数 na.strings 读取数据集时,您可以将空白设置为 NA。

标签: r dplyr reshape2 tidyr


【解决方案1】:

您还可以从dplyr 包中添加filter(以删除空白)和select(以删除key 列)并一次性获取所有内容

temp <- dat %>% 
  gather(key, value, 2:4) %>% 
  filter(value != "") %>%
  select(-key)

#                 team             value
# 1       Loughborough      Loughborough
# 2         Luton Town        Luton Town
# 3       Macclesfield      Macclesfield
# 4   Maidstone United  Maidstone United
# 5    Manchester City   Manchester City
# 6  Manchester United Manchester United
# 7     Mansfield Town    Mansfield Town
# 8       Merthyr Town      Merthyr Town
# 9         Luton Town             Luton
# 10   Manchester City          Man City
# 11 Manchester United      Newton Heath
# 12    Mansfield Town         Mansfield
# 13 Manchester United        Man United

【讨论】:

    【解决方案2】:

    您在寻找:temp[temp$value!='',]gather 不应归咎于空字符串,您的初始数据也有它们。您可以先替换它们,然后在gather 中使用na.rm 参数:

    dat[dat==''] <- NA
    temp <- dat %>% gather(key, value, 2:4, na.rm=TRUE) 
    temp$key<-NULL
    tempA
    

    【讨论】:

      【解决方案3】:

      类似的方法,但使用了 na.omit:

      dat %>% 
        gather(key, value, -team) %>% 
        select(-key) %>%
        mutate(value = ifelse(value == "", NA, value)) %>%
        na.omit %>%
        arrange(team)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2014-04-16
        • 1970-01-01
        • 2022-12-11
        • 2019-05-27
        • 1970-01-01
        相关资源
        最近更新 更多