【问题标题】:Get the latest updated value获取最新的更新值
【发布时间】:2021-10-04 20:47:38
【问题描述】:

关注这个话题Get the latest updated value in R

我想添加一个从@TarJae 运行解决方案时出现的新问题

我做了一个新的例子

df <- data.frame(name1 = c("Acacia pinnata", "Acer laurinum", "Acmella paniculata", "Aglaia lawii", NA, NA),
                 name2 = c(NA, NA, NA, NA, "Acer laurinum Hassk.", "Aglaia lawii (Wight)"),
                 name3 = c("Senegalia rugata (Lam.) Britton & Rose", "Acer laurinum", "Acmella paniculata", "Aglaia lawii", "Acer laurinum Hassk.", "Aglaia lawii (Wight)"))
name1                               name2                                  name3
1     Acacia pinnata                 <NA> Senegalia rugata (Lam.) Britton & Rose
2      Acer laurinum                 <NA>                          Acer laurinum
3 Acmella paniculata                 <NA>                     Acmella paniculata
4       Aglaia lawii                 <NA>                           Aglaia lawii
5               <NA> Acer laurinum Hassk.                   Acer laurinum Hassk.
6               <NA> Aglaia lawii (Wight)                   Aglaia lawii (Wight)

现在使用来自@TarJae 的代码

我们会有这样的输出(似乎是被提升的行)

df %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(
    cols = -id
  ) %>% 
  mutate(helper= word(value, 1)) %>% 
  group_by(helper) %>% 
  mutate(value= last(value)) %>% 
  pivot_wider(
    names_from = name,
    values_from = value
  ) %>% 
  ungroup() %>% 
  select(-id, -helper) %>% 
  filter(if_any(everything(), ~ !is.na(.)))
  name1                name2                name3                                 
  <chr>                <chr>                <chr>                                 
1 Acacia pinnata       NA                   NA                                    
2 NA                   NA                   Senegalia rugata (Lam.) Britton & Rose
3 Acer laurinum Hassk. NA                   Acer laurinum Hassk.                  
4 Acmella paniculata   NA                   Acmella paniculata                    
5 Aglaia lawii (Wight) NA                   Aglaia lawii (Wight)                  
6 NA                   Acer laurinum Hassk. Acer laurinum Hassk.                  
7 NA                   Aglaia lawii (Wight) Aglaia lawii (Wight)       

我们将看到我们的第一行已经分成两行。我现在在想是将2这一行合并为一。但是,如果我真的有很多这样的情况怎么办。

对此有何建议?

更新代码示例* 既然提出了另一个问题,我想在这里再做一张桌子。

df <- data.frame(name1 = c("Acacia pinnata", "Acer laurinum", "Acmella paniculata", "Aglaia lawii", NA, NA, "Alangium javanicum", "Alangium longiflorum", NA),
                 name2 = c(NA, NA, NA, NA, "Acer laurinum Hassk.", "Aglaia lawii (Wight)", NA,NA, "Alangium javanicum (Blume) Wangerin"),
                 name3 = c("Senegalia rugata (Lam.) Britton & Rose", "Acer laurinum", "Acmella paniculata", "Aglaia lawii", "Acer laurinum Hassk.", "Aglaia lawii (Wight)",
                           "Alangium javanicum", "Celtis cf. rigescens (Miq.) Planch.", "Alangium javanicum (Blume) Wangerin"))

使用来自@TarJae 的代码,我们现在可以看到Celtis cf. rigescens (Miq.) Planch.Alangium javanicum (Blume) Wangerin``not Alangium longiflorum` 匹配。

期望的输出

 name1                               name2                              name3                                                                                                                        
1 Acacia pinnata                      NA                                  Senegalia rugata (Lam.) Britton & ~
2 Acer laurinum Hassk.                NA                                  Acer laurinum Hassk.               
3 Acmella paniculata                  NA                                  Acmella paniculata                 
4 Aglaia lawii (Wight)                NA                                  Aglaia lawii (Wight)               
5 NA                                  Acer laurinum Hassk.                Acer laurinum Hassk.               
6 NA                                  Aglaia lawii (Wight)                Aglaia lawii (Wight)               
7 Alangium javanicum (Blume) Wangerin NA                                  Alangium javanicum (Blume) Wangerin
8 Alangium longiflorum                NA                                  Celtis cf. rigescens (Miq.) Planch.
9 NA                                  Alangium javanicum (Blume) Wangerin Alangium javanicum (Blume) Wangerin

【问题讨论】:

  • 能否请您显示上次更新的预期输出。

标签: r dplyr tidyverse


【解决方案1】:

更新: 关键在这里:mutate(helper= word(value, 1,2)) 我们按前两个单词分组,然后用最后一个词填充组。如果这还不够,您可以按前 3 个单词等进行分组...

df %>% 
  pivot_longer(
    cols = everything(),
  ) %>% 
  mutate(helper= word(value, 1,2)) %>% 
  group_by(helper) %>% 
  mutate(value= last(value)) %>% 
  ungroup() %>% 
  select(-helper) %>%
  pivot_wider(
    names_from = name,
    values_from = value,
  ) %>% 
  unnest(cols = c(name1, name2, name3))
  name1                               name2                               name3                              
  <chr>                               <chr>                               <chr>                              
1 Acacia pinnata                      NA                                  Senegalia rugata (Lam.) Britton & ~
2 Acer laurinum Hassk.                NA                                  Acer laurinum Hassk.               
3 Acmella paniculata                  NA                                  Acmella paniculata                 
4 Aglaia lawii (Wight)                NA                                  Aglaia lawii (Wight)               
5 NA                                  Acer laurinum Hassk.                Acer laurinum Hassk.               
6 NA                                  Aglaia lawii (Wight)                Aglaia lawii (Wight)               
7 Alangium javanicum (Blume) Wangerin NA                                  Alangium javanicum (Blume) Wangerin
8 Alangium longiflorum                NA                                  Celtis cf. rigescens (Miq.) Planch.
9 NA                                  Alangium javanicum (Blume) Wangerin Alangium javanicum (Blume) Wangerin

  1. 我们不需要id
  2. 我们必须在mutate(value= last(value)) 之后ungroup()
  3. pivot_wider 之前删除helper 以保持行不变。
library(dplyr)
library(tidyr)
library(stringr)

df %>% 
  pivot_longer(
    cols = everything(),
  ) %>% 
  mutate(helper= word(value, 1)) %>% 
  group_by(helper) %>% 
  mutate(value= last(value)) %>% 
  ungroup() %>% 
  select(-helper) %>% 
  pivot_wider(
    names_from = name,
    values_from = value,
  ) %>% 
  unnest(cols = c(name1, name2, name3))

输出:

  name1                name2                name3                                 
  <chr>                <chr>                <chr>                                 
1 Acacia pinnata       NA                   Senegalia rugata (Lam.) Britton & Rose
2 Acer laurinum Hassk. NA                   Acer laurinum Hassk.                  
3 Acmella paniculata   NA                   Acmella paniculata                    
4 Aglaia lawii (Wight) NA                   Aglaia lawii (Wight)                  
5 NA                   Acer laurinum Hassk. Acer laurinum Hassk.                  
6 NA                   Aglaia lawii (Wight) Aglaia lawii (Wight) 

【讨论】:

  • 它似乎运作良好。我发现name3 是我想要的,但name1 列现在似乎是错误的,因为许多名称被具有相同属的不同名称替换。我想因为你在这里做的方式,你减去genus,当我们group_by并重新格式化时,name3将与genus发生冲突。如果我们在 name1 中有两个不同的名称但同属怎么办。这也是我刚刚注意到的问题
  • 让我编辑我的例子!
猜你喜欢
  • 1970-01-01
  • 2021-11-25
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-01-31
  • 1970-01-01
相关资源
最近更新 更多