【问题标题】:Split with multiple delimeters and special chartacter in js在js中使用多个分隔符和特殊字符拆分
【发布时间】:2019-12-15 02:51:31
【问题描述】:

我正在尝试为基于各种字符串和运算符的拆分字符串创建一个正则表达式。我们如何做到这一点?

下面是我的代码:

var author;
var authorResult = [];
ByREGEX=/By|From|says\s|,/g;
author = authorByline.split(ByREGEX);
if(!author[1].trim()) {
   author[1] = author[2].trim();
   author[2] = '';
}
authorResult['name'] = author[1].trim();

if("2" in author){
   authorResult['role'] = author[2].trim();
} else {
   authorResult['role'] = '';
}

return authorResult;

以下是我的字符串,预计会输出:

From Bru Water(Delimeter:From) : Expected output(Author: Bru Water, Role:'')

By Matth Moo, Med Corresponde(Delimeter:'By' , ',') : **Expected output(Author: Matth Moo, Role:Med Corresponde)**

Analysis by Davidd Cross in London(Delimeter:'Analysis by' , 'in') : **Expected output(Author: Davidd Cross, Role:'')**

left and right, says Daavid Aaronovi(Delimeter:'says'): **Expected output(Author: Daavid Aaronovi, Role:'')**

From Dav Chart and Bo De(Delimeter:'From','and') : **Expected output(Author1: Dav Chart, Role1:'',Author2: Bo De, Role2:'')**

By Oliv Wrig, Poli Edit, and Franc Ellio, Politic Edit(Delimeter:'By','and'): **Expected output(Author1: Oliv Wrig, Role1:'Poli Edit',Author2: Franc Ellio, Role2:'Politic Edit')**

By RCAik Brbent(Delimeter:'By'): Expected output(Author: RCAik Brbent, Role:'')

From TomTY Knowl, Technolog Reporte(Delimeter:'From',','): **Expected output(Author: TomTY Knowl, Role:'Technolog Reporte')**

【问题讨论】:

  • 你能在字符串中添加一个字符串分隔符吗?
  • 嗨,nina,我已为所有字符串添加分隔符
  • 我想我会创建一个策略模式并为每个返回作者和角色的案例实施策略。然后每行运行所有策略以产生一系列结果。在我看来,这将是一个更具可读性、可测试性和可维护性的解决方案。 if (_.includes(authorByline.toLowerCase(), 'from')) 可能是其中一种策略的开始(使用 lodash)

标签: javascript regex


【解决方案1】:

我设法使用拆分和加入做了一些非常大的事情

还有Davidd Cross in London这样的问题

它还返回一个数组而不是一个对象

如果您需要我更多地清理数据,请在评论中告诉我,但我认为您应该可以自己这样做


使用数组来设置作者、角色和其他作者之间的标识符和分隔符, 并将它们全部针对字符串运行

let lines = [
  "From Bru Water", // : Expected output(Author: Bru Water, Role:'')
  "By Matth Moo, Med Corresponde", // : **Expected output(Author: Matth Moo, Role:Med Corresponde)**
  "Analysis by Davidd Cross in London", // : **Expected output(Author: Davidd Cross, Role:'')**
  "left and right, says Daavid Aaronovi", // : **Expected output(Author: Daavid Aaronovi, Role:'')**
  "From Dav Chart and Bo De", // : **Expected output(Author1: Dav Chart, Role1:'',Author2: Bo De, Role2:'')**
  "By Oliv Wrig, Poli Edit, and Franc Ellio, Politic Edit", //: **Expected output(Author1: Oliv Wrig, Role1:'Poli Edit',Author2: Franc Ellio, Role2:'Politic Edit')**
  "By RCAik Brbent", // : Expected output(Author: RCAik Brbent, Role:'')
  "From TomTY Knowl, Technolog Reporte" // : **Expected output(Author: TomTY Knowl, Role:'Technolog Reporte')**
]

let nameIdentifier = ["from", "says", "by"] // these are followed by an Author name
let authorsSeparator = ["and"] // these are between two Authors
let authorRoleSeparator = [","] // these are between an Author and it's role
let tempSeparator = "somethingWhichAppearNowhereElse"

let result = lines.map(line => {
  // get authors
  let authors = line
  authorsSeparator.forEach(separator => {
    authors = line.split(separator).join(tempSeparator)
  })
  authors = authors.split(tempSeparator)
  
  
  // remove first object of array if not an authors
  let keep = false
  nameIdentifier.forEach(identifier => {
    keep |= authors[0].toLowerCase().includes(identifier)
  })
  if(! keep) { authors.shift() } // remove the first entry from the array

  // remove the identifiers to get the authors name
  authors.forEach((auth, i) => {
    nameIdentifier.forEach(identifier => {
      let identifierIndex = auth.toLowerCase().indexOf(identifier)
      if(identifierIndex !== -1) {
        auth = auth.substring(identifierIndex + identifier.length)
      }
      authors[i] = auth.trim()
    })
  })

  // separator authors name from their roles
  return authors.map(auth => {
    let author = auth
    authorRoleSeparator.forEach(separator => {
      author = auth.split(separator).join(tempSeparator)
    })
    return author.split(tempSeparator)
  })
})

console.log(result)

【讨论】:

    【解决方案2】:

    应该这样做:

    function sentenceToAuthor(sentence) {
      //Check that sentence contains keyword
      if (sentence.match(/(\s|^)(by|from|says)\s/ig)) {
        //list of author names
        var returner = [];
        //flag if activation word triggered
        var found = false;
        //for each non-whitespace string-block
        sentence.match(/\S+/ig).forEach(function(word) {
          if (found === false) { // If activation word not reached
            if (['from', 'by', 'says'].indexOf(word.toLocaleLowerCase()) >= 0) { // check if word is activation word 
              found = true;
            }
          } else if (found === true) { // If activated
            if (word === 'and') { // special case "and" pushes a seperator for later use
              returner.push(',');
            } else if (word[0] == word[0].toUpperCase()) { // If first letter is uppercase, add word to returner
              returner.push(word.replace(/\W/ig, ''));
              if (word.match(/\W$/ig)) { // If word ends in non-word symbol like ",", disable activation
                found = null;
              }
            } else { // If not uppercase word, disable activation
              found = null;
            }
          }
        });
        // join names and split by seperator
        return returner.join(" ").split(',').map(function(w) {
          return w.trim();
        });
      }
      return false;
    }
    //TESTS
    var tests = [
      "From Bru Water",
      "By Matth Moo, Med Corresponde",
      "Analysis by Davidd Cross in London",
      "left and right, says Daavid Aaronovi",
      "From Dav Chart and Bo De",
      "By Oliv Wrig, Poli Edit, and Franc Ellio, Politic Edit",
      "By RCAik Brbent",
      "From TomTY Knowl, Technolog Reporte"
    ];
    //Run tests
    console.log(tests.map(sentenceToAuthor));

    【讨论】:

      【解决方案3】:

      正在使用 cmets 中提出的策略模式创建解决方案。

      没有完成,但希望它能说明这个想法:

      const lines = [
        "From Bru Water",
        "By Matth Moo, Med Corresponde",
        "Analysis by Davidd Cross in London",
        "left and right, says Daavid Aaronovi",
        "From Dav Chart and Bo De",
        "By Oliv Wrig, Poli Edit, and Franc Ellio, Politic Edit",
        "By RCAik Brbent",
        "From TomTY Knowl, Technolog Reporte"
      ];
      
      // naive, always assume name and role being 2 words
      const toUpperString = (wordArray) => {
      
          const noCommasUpperFirst = (str) => {
              return _.upperFirst(_.replace(str, ',', ''))
        } 
      
          return _.join(_.map(_.take(wordArray, 2), noCommasUpperFirst), ' ');
      }
      
      // assumes author to be the first two entries 
      const createAuthorAndRole = (authorWordArray) => {
      
        const hasRole = _.includes(authorWordArray[1], ',');
      
        if (hasRole) {
          const roleWordArray = _.slice(authorWordArray, 2);
      
          return {
              author: toUpperString(authorWordArray),
            role: toUpperString(roleWordArray)
          }
        }
      
        return {
          author: toUpperString(authorWordArray)
        }
      }
      
      const simpleMatchStrategy  = (wordArray, word) => {
        const index = _.indexOf(wordArray, word);
        if (index !== -1) {
            return createAuthorAndRole(_.without(wordArray, word));
        }
      }
      
      const strategies = [
        (wordArray) => simpleMatchStrategy(wordArray, 'from'),
        (wordArray) => simpleMatchStrategy(wordArray, 'by'),
        (wordArray) => simpleMatchStrategy(wordArray, 'says')
      ]
      
      const results = [];
      
      lines.forEach((line) => {
          console.log("line:", line);
      
          const wordArray = line.toLowerCase().match(/\S+/g) || [];
      
        strategies.forEach((strategy) => {
          const result = strategy(wordArray);
          if (result) {
            results.push(result);
          }
        })
      });
      
      console.log(results)
      ```
      
      https://jsfiddle.net/tdgxs8b5/
      

      【讨论】:

        【解决方案4】:

        这是一个捕获组中名称和角色的正则表达式:

        /(?:from|by|says|and)\s([A-z]+\s[A-z]+)(?:(?:,|\sand)\s([A-z]+\s[A-z]+))?/ig

        第 1 组的作者和第 2 组的角色。

        您可以在https://regex101.com/上试用

        编辑:上面的正则表达式假设名称和角色是 2 个单词,经过改进以捕获所有大写单词

        /(?:from|by|says|and)\s([A-Z\b\s]+)(?:(?:,|\sand)\s([A-Z\b\s]+))?/ig

        【讨论】:

        • 感谢@cYrixmorten 的回复。我还有另一个案例字符串:- From Tom Knowles, West Coast Technology Reporter 。在这种情况下,只有前 2 个单词起作用。但我想要West Coast Technology Reporter这个角色
        • 你是对的,它假设名称和角色是 2 个单词。还尝试删除不区分大小写并在分隔符后捕获所有大写单词,但并非适用于所有情况。
        • 对可以尝试的答案进行了编辑,我在我的手机上,所以无法如此轻松地测试它。
        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2021-07-02
        • 1970-01-01
        • 2020-12-19
        • 2018-02-27
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多