如何在整个 data.frame 中搜索字符串答案

【问题标题】：How to search for a string in an entire data.frame如何在整个 data.frame 中搜索字符串
【发布时间】：2018-07-25 19:01:05
【问题描述】：

我有下表，其中有汽车备件的项目名称。我有汽车制造商制造的特定零件的 ITEM 代码，我也有零件制造商制造的同一零件的相应 ITEM 代码。

我会定期收到一个输入，我只收到已售出的 ITEM 代码。如何确定售出的零件。

> trial
# A tibble: 6 x 5
  Name         `OEM Part` `OES 1 Code`   `OES 2 Code` `OES 3 Code`
  <chr>        <chr>      <chr>          <chr>        <chr>       
1 Brakes       231049A76  1910290/230023 NA           NA          
2 Cables       2410ASD12  NA             219930       3213Q23     
3 Tyres        9412HJ12   231233         NA           NA          
4 Suspension   756634K71  782320/880716  NA           NA          
5 Ball Bearing 2IW2WD23   231224         NA           NA          
6 Clutches     9304JFW3   NA             QQW223       23RQR3

如果我输入了以下值

> item_code <- c("231049A76", "1910290", "1910290", "23RQR3")

我需要以下输出

Name
Brakes
Brakes
Brakes
Clutches

注意： 1910290 和 230023 是独立的部分；两者都是稍作改动的刹车。

【问题讨论】：

标签： r regex dplyr tidyr tidyverse

【解决方案1】：

如果您将数据重塑为长格式，则可以使用连接：

library(tidyverse)

trial <- tibble(Name = c("Brakes", "Cables", "Tyres", "Suspension", "Ball Bearing", "Clutches"), 
                `OEM Part` = c("231049A76", "2410ASD12", "9412HJ12", "756634K71", "2IW2WD23", "9304JFW3"), 
                `OES 1 Code` = c("1910290/230023", NA, "231233", "782320/880716", "231224", NA), 
                `OES 2 Code` = c(NA, "219930", NA, NA, NA, "QQW223"), 
                `OES 3 Code` = c(NA, "3213Q23", NA, NA, NA, "23RQR3"))

trial_long <- trial %>% 
    gather('code_type', 'code', -Name) %>%    # reshape to long form
    separate_rows(code) %>%    # separate double values
    drop_na(code)    # drop unnecessary NA rows

# join to filter and duplicate
trial_long %>% 
    right_join(tibble(code = c("231049A76", "1910290", "1910290", "23RQR3")))
#> # A tibble: 4 x 3
#>   Name     code_type  code     
#>   <chr>    <chr>      <chr>    
#> 1 Brakes   OEM Part   231049A76
#> 2 Brakes   OES 1 Code 1910290  
#> 3 Brakes   OES 1 Code 1910290  
#> 4 Clutches OES 3 Code 23RQR3

【讨论】：

【解决方案2】：

使用sapply 和apply 的效率不高的方法，我们找出trial 中的哪一行拥有item_code，然后得到它对应的Name 值。

sapply(item_code, function(x)   
            trial$Name[apply(trial[-1], 1,  function(y)  any(grepl(x, y)))])

# 231049A76    1910290    1910290     23RQR3 
#  "Brakes"   "Brakes"   "Brakes" "Clutches"

如果您不需要名称，请在sapply 中设置USE.NAMES = FALSE。

【讨论】：

【解决方案3】：

这是一个与您使用 base 的示例类似的示例：

## Create a dummy matrix
example <- cbind(matrix(1:4, 4,1), matrix(letters[1:20], 4, 4))
colnames(example) <- c("names", "W", "X", "Y", "Z")
#     names W   X   Y   Z  
#[1,] "1"   "a" "e" "i" "m"
#[2,] "2"   "b" "f" "j" "n"
#[3,] "3"   "c" "g" "k" "o"
#[4,] "4"   "d" "h" "l" "p"

此表与您的表相似，其中名称位于第一列，要匹配的模式位于其他列中。

## The pattern of interest
pattern <- c("a","e", "f", "p")

对于这种模式，我们期望得到以下结果："1","1","2","4"。

## Detecting the pattern per row
matching_rows <- row(example[,-1])[example[,-1] %in% pattern]
#[1] 1 1 2 4

## Returning the rows with the pattern
example[matching_rows,1]
#[1] "1" "1" "2" "4"

【讨论】：

你应该很少需要apply - 特别是%in% 或== 可以跨矩阵工作 - 例如row(example[,-1])[example[,-1] %in% pattern]。
谢谢！我已经更新了示例，它现在也更具可读性！