在 R 中保留具有特定字符串值的行数据答案

【问题标题】：Keep rows data with specific string value in R在 R 中保留具有特定字符串值的行数据
【发布时间】：2016-03-26 04:20:25
【问题描述】：

首先，我有字符串列表：

/index.php/abc/def
/link/view/id/123
/subject/view/id/456

然后，我有这样的数据集：

Date and Time          Request
2016-01-17 05:46:26    aladdine.com/view/id/786
2016-01-17 05:46:30    aladdine.com/subject/view/id/456
2016-01-17 05:46:31    aladdine.com/pub/link/view/id/123
2016-01-17 05:46:44    aladdine.com/index.php/abc/def/ghi
2016-01-17 05:46:58    aladdine.com/brs/view/id.266

如何保留与上一个列表具有相似文本的数据集？

输出：

Date and Time          Request
2016-01-17 05:46:30    aladdine.com/subject/view/id/456
2016-01-17 05:46:31    aladdine.com/pub/link/view/id/123
2016-01-17 05:46:44    aladdine.com/index.php/abc/def/ghi

【问题讨论】：

标签： r regex string-comparison

【解决方案1】：

使用与@Cinnamon Star 相同的数据集，您可以这样做：

dataSet <- CO2;
iList <- list("Qn1", "Mn1", "Mc1");

将所有字符串连接成一个(str1|str2|str3)类型的正则表达式模式：

pat = paste(unlist (iList),collapse = "|")
pat = paste0("(",pat,")")

然后执行 grepl 来确定列 Plant 中哪些行包含该文本。

dataSet[grepl(pattern = pat,x = dataSet$Plant),]

结果：

   Plant        Type  Treatment conc uptake
1    Qn1      Quebec nonchilled   95   16.0
2    Qn1      Quebec nonchilled  175   30.4
3    Qn1      Quebec nonchilled  250   34.8
4    Qn1      Quebec nonchilled  350   37.2
5    Qn1      Quebec nonchilled  500   35.3
6    Qn1      Quebec nonchilled  675   39.2
7    Qn1      Quebec nonchilled 1000   39.7
43   Mn1 Mississippi nonchilled   95   10.6
44   Mn1 Mississippi nonchilled  175   19.2
45   Mn1 Mississippi nonchilled  250   26.2
46   Mn1 Mississippi nonchilled  350   30.0
47   Mn1 Mississippi nonchilled  500   30.9
48   Mn1 Mississippi nonchilled  675   32.4
49   Mn1 Mississippi nonchilled 1000   35.5
64   Mc1 Mississippi    chilled   95   10.5
65   Mc1 Mississippi    chilled  175   14.9
66   Mc1 Mississippi    chilled  250   18.1
67   Mc1 Mississippi    chilled  350   18.9
68   Mc1 Mississippi    chilled  500   19.5
69   Mc1 Mississippi    chilled  675   22.2
70   Mc1 Mississippi    chilled 1000   21.9

【讨论】：

【解决方案2】：

我从 R 数据库中取出了 CO2 示例。请将您的数据集分配给dataSet，将您的列表分配给iList，并将所有出现的dataSet$Plant 更改为您感兴趣的列（可能是dataSet$Request）。

生成的数据集保存在results。

rm(list = ls());

dataSet <- CO2;

varsToCheck <- dataSet$Plant;

iList <- list("Qn1", "Mn1", "Mc1");

# Iterate over all rows
for(i in 1:length(dataSet$Plant)) {
  # Extract string for checking
  validateString <- varsToCheck[i];
  # Iterate over all match criterions
  for(j in 1:length(iList)) {
    # Extract the match criterion
    matchString <- iList[[j]];
    # Validate if part of the string match the criterion
    if(grepl(matchString, validateString)) {
      # Create results object when we first add a row
      if(exists("results")) {
        results <- rbind(results, dataSet[i,]);
      } else {
        results <- dataSet[i,];
      }
    }
  }

}

【讨论】：

感谢您的回答，这很有帮助.. 但我对结果有疑问。为什么结果只显示与 iList 匹配的数据集的最后一行？因此，结果将始终有 1 行
我不知道，也许我错过了一些东西，但对我来说它工作正常。举个例子自己试试吧。