Python 到 R 翻译答案

【问题标题】：Python to R TranslationPython 到 R 翻译
【发布时间】：2022-01-18 15:43:09
【问题描述】：

我在 Python 中有几行代码，我正试图在 R 中复制它们，但我承认我目前还不够熟练，无法弄清楚。

这是 Python 中的代码：

import pandas as pd
df = pd.DataGram ({'col_a' : ["blue shovel 1024", "red shovel 1022", "green bucket 3021", "green rake 3021", 
"yellow shovel 1023"], 'col_b' : ["blue", "red", "green", "blue", "yellow"]},

columns = ["col_a", "col_b"])

unique_words = list(df.col_b.unique())
unique
["blue", "red", "green", "yellow"]

df['result] = df['col_a'].apply(lambda x:','.join([item for item in str(x).split () \
                                                  if item in unique_words]))

运行上面代码的结果给你这个：

    col_a                      col_b          result
1   blue shovel 1024           blue           blue
2   red shovel 1022            red            red
3   green buckets 3021         green          green
4   green rake 3021            blue           green
5   yellow shovel 1023         yellow         yellow

此代码的目标是在 col_b 中创建一个唯一值列表，然后在 col_a 中搜索任何这些唯一值，如果找到它们，则将它们放在结果列中。请注意，在第 4 行中，结果为绿色。这是正确的，因为即使 col_b 显示第 4 行的值为蓝色，但 col_a 中的实际值为绿色。

我已尝试重写此部分：

df['result] = df['col_a'].apply(lambda x:','.join([item for item in str(x).split () \
                                                  if item in unique_words]))

在 R 中（我的想法是编写一个函数并尝试一个 lapply()，但要么我做错了，要么这不是正确的方法。提前感谢您的任何建议或帮助，我会检查回来看看是否有任何我可以回答的问题或我可以帮助澄清的信息。再次感谢您！

【问题讨论】：

标签： python r

【解决方案1】：

library(tidyverse)

df <- tibble(
  col_a = c("blue shovel 1024", "red shovel 1022", "green bucket 3021", "green rake 3021", "yellow shovel 1023"),
  col_b = c("blue", "red", "green", "blue", "yellow")
)
df
#> # A tibble: 5 x 2
#>   col_a              col_b 
#>   <chr>              <chr> 
#> 1 blue shovel 1024   blue  
#> 2 red shovel 1022    red   
#> 3 green bucket 3021  green 
#> 4 green rake 3021    blue  
#> 5 yellow shovel 1023 yellow

unique_words <- unique(df$col_b)
unique_words
#> [1] "blue"   "red"    "green"  "yellow"
unique_words_regex <- unique_words %>% paste0(collapse = "|")

df <- mutate(df, result = col_a %>% str_extract(unique_words_regex))
df
#> # A tibble: 5 x 3
#>   col_a              col_b  result
#>   <chr>              <chr>  <chr> 
#> 1 blue shovel 1024   blue   blue  
#> 2 red shovel 1022    red    red   
#> 3 green bucket 3021  green  green 
#> 4 green rake 3021    blue   green 
#> 5 yellow shovel 1023 yellow yellow

^{由reprex package (v2.0.1) 于 2021 年 12 月 15 日创建}

【讨论】：

我想就是这样！非常感谢！