在文件中搜索字符串数组答案

【问题标题】：Searching an array of strings in a file在文件中搜索字符串数组
【发布时间】：2020-07-31 02:50:07
【问题描述】：

我有一个文本文件，比如testFile.txt，还有一个要在文件中搜索的字符串数组，比如['year', 'weather', 'USD 34235.00', 'sportsman', 'ಕನ್ನಡ']。我可以使用NodeJS natural 将文件分解为标记，也许可以从中创建一个 large 数组（~100-200x 字符串数组中的条目数）。然后，对两个数组进行排序并开始搜索。或者，直接使用lodash？

Found 结果是在文本文件中找到搜索字符串数组中的至少一个字符串时；否则，应视为NotFound。

有哪些选项可以实现这样的搜索？

【问题讨论】：

我认为这会有所帮助：dev.to/akhilpokle/the-algorithm-behind-ctrl-f-3hgh

标签： node.js arrays nlp full-text-search stringtokenizer

【解决方案1】：

我可以建议将Set 用于大量标记数组，然后遍历搜索术语数组，检查标记是否设置has 这些术语之一。如果 terms 数组也很大，您可以考虑使用 Set (MDN docs for Set)

从comment comment可以看到数组和集合在大量元素的上下文中的性能比较

下面是演示sn-p

const tokens1 = ['ಕನ್ನಡ', 'asdasd', 'zxczxc', 'sadasd', 'wqeqweqwe', 'xzczxc']
const tokens2 = ['xzczcxz', 'asdqwdaxcxzc', 'asdxzcxzc', 'wqeqwe', 'zxczcxzxcasd']
const terms = ['year', 'weather', 'USD 34235.00', 'sportsman', 'ಕನ್ನಡ']

const set1 = new Set(tokens1)
const set2 = new Set(tokens2)

const find = (tokensSet, termsArray) => {
  for (const term of termsArray) {
    if (tokensSet.has(term)) {
      return 'Found'
    }
  }
  return 'Not Found'
}

console.log(find(set1, terms))
console.log(find(set2, terms))

【讨论】：