【问题标题】:How to split the String based on array elements into array retaining the array the split word in javascript如何将基于数组元素的字符串拆分为数组,保留数组中的拆分词在javascript中
【发布时间】:2021-06-23 08:14:10
【问题描述】:

我有一个字符串

sReasons =  "O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt";

我需要根据分隔符数组来拆分上面的字符串

const separator = ["O9", "EO", "HJ", "J8"];

其中前 2 个字符 (O9) 表示网络代码,接下来的 4 个字符是另一个代码 (C270) & 接下来的 4 个字符 (0021) 是不符合 SDWC 条件的字符串的长度

分隔符代码唯一,2个大写字母,textMessage中除inEligType外不会重复

我需要创建一个json格式

{
    {inEligType: "O9", msgCode: "C270", msgLen: "0021", textMsg: "Not eligible for SDWC"},
    {inEligType: "EO", msgCode: "C390", msgLen: "0015", textMsg: "Service upgrade"},
    {inEligType: "HJ", msgCode: "C390", msgLen: "0015", textMsg: "Service upgrade"},
    {inEligType: "J8", msgCode: "C500", msgLen: "0016", textMsg: "Delivery Attempt"}
}

我基本上没有根据给定的数组拆分字符串本身,我尝试了以下

sReasons =  "O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt";    
const separator = ["O9", "EO", "HJ", "J8"];

function formatReasons(Reasons: string) {
var words: any[] = Reasons.split(this.spearator); 
for(let word in words)
    {
       console.log(word) ;
    }
}
var result = formatReasons(sHdnReasonsCreate);
console.log("Returned Result: "+result);

但它给了我结果

["O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt"]length: 1__proto__: Array(0)

Returned Address is: undefined

【问题讨论】:

  • 如果用作分隔符的两个字母字符串之一恰好出现在textMessage 字段的中间,你会怎么做?您最好根据实际数据格式进行拆分,方法是采用适当长度的子字符串
  • 它们不会出现,因为它们是独一无二的,不会出现在 textMessagemsgCode

标签: javascript angular regex typescript split


【解决方案1】:

我的基于正则表达式的方法:

sReasons =  "O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt";    
const separator = ["O9", "EO", "HJ", "J8"];

// build the regex based on separators
let regexPattern = '^';
separator.forEach(text => {
    regexPattern += `${text}(.*)`;
});
regexPattern += '$';

// match the reasons
let r = new RegExp(regexPattern);
let matches = sReasons.match(r);

// prepare to match each message
let msgMatcher = new RegExp('^(?<msgCode>.{4})(?<msgLen>.{4})(?<textMsg>.*)$');
let output = [];

for (let i=1; i<matches.length; i++) {
    // match the message
    const msg = matches[i].match(msgMatcher);

    // store
    let item = msg.groups;
    item.inEligType = separator[i-1];
    output.push(item);
}

console.log(JSON.stringify(output, null, 2));

生产

[
  {
    "msgCode": "C270",
    "msgLen": "0021",
    "textMsg": "Not eligible for SDWC",
    "inEligType": "O9"
  },
  {
    "msgCode": "C090",
    "msgLen": "0015",
    "textMsg": "Service upgrade",
    "inEligType": "EO"
  },
  {
    "msgCode": "C390",
    "msgLen": "0015",
    "textMsg": "Service upgrade",
    "inEligType": "HJ"
  },
  {
    "msgCode": "C5HJ",
    "msgLen": "0016",
    "textMsg": "Delivery Attempt",
    "inEligType": "J8"
  }
]

【讨论】:

  • 当然,这只适用于这个特定的例子。
  • @georg 怎么样,你建议将它应用于哪些抽象的其他示例?它可以承受separator 中的更改而无需对逻辑进行任何修改,并且硬编码为 4 的两个字符串长度是这样定义的,但是可以直接更改它们。
  • 您假设每个“分隔符”只出现一次,并且它们总是按给定的顺序出现。在一般情况下,这可能不是真的。
  • 我明白你现在的意思了。是的,有一种更好的方法来概括它,就像我在@tarkh 看到的答案一样,我会将其修改为类似于该方法的方法。
【解决方案2】:

很可能textMsg 字段或任何其他字段将永远包含您用于inEligType 字段的两个字母字符串。但你绝对确定吗?在我看来,数据格式确实希望有人通过特定长度的子字符串来解析它;如果您可以根据分隔符进行拆分,为什么还要有一个msgLen 字段?如果将来inEligType 代码列表发生变化怎么办?

出于这些原因,我强烈建议您按子字符串长度而不是按分隔符匹配进行解析。这是一种可能的方法:

function formatReasons(reasons: string) {
  const ret = []
  while (reasons) {
    const inEligType = reasons.substring(0, 2);
    reasons = reasons.substring(2);
    const msgCode = reasons.substring(0, 4);
    reasons = reasons.substring(4);
    const msgLen = reasons.substring(0, 4);
    reasons = reasons.substring(4);
    const textMsg = reasons.substring(0, +msgLen);
    reasons = reasons.substring(+msgLen);
    ret.push({ inEligType, msgCode, msgLen, textMsg });
  }
  return ret;
}

您可以验证它是否为您的示例 sReasons 字符串产生了预期的输出:

const formattedReasons = formatReasons(sReasons);
console.log(JSON.stringify(formattedReasons, undefined, 2));
/* [
  {
    "inEligType": "O9",
    "msgCode": "C270",
    "msgLen": "0021",
    "textMsg": "Not eligible for SDWC"
  },
  {
    "inEligType": "EO",
    "msgCode": "C090",
    "msgLen": "0015",
    "textMsg": "Service upgrade"
  },
  {
    "inEligType": "HJ",
    "msgCode": "C390",
    "msgLen": "0015",
    "textMsg": "Service upgrade"
  },
  {
    "inEligType": "J8",
    "msgCode": "C5HJ",
    "msgLen": "0016",
    "textMsg": "Delivery Attempt"
  }
] */

请注意,上面的实现不会检查字符串是否正确格式化;现在,如果你把垃圾送进去,你就会把垃圾拿出来。如果您想要更高的安全性,您可以进行运行时检查并抛出错误,例如,意外地跑出reasons 字符串的末尾,或者找到不代表数字的msgLen 字段。并且可以refactor,这样就不会重复像const s = reasons.substring(0, n); reasons = reasons.substring(n) 这样的代码。但是基本算法就在那里。

Playground link to code

【讨论】:

    【解决方案3】:

    RegExp 的另一种选择,代码更少

    // Your data
    const data =  "O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt";
    
    // Set your data splitters from array
    const spl = ["O9", "EO", "HJ", "J8"].join('|');
    
    // Use regexp to parse data
    const results = [];
    data.replace(new RegExp(`(${spl})(\\w{4})(\\w{4})(.*?)(?=${spl}|$)`, 'g'), (m,a,b,c,d) => {
      // Form objects and push to res
      results.push({
        inEligType: a,
        msgCode: b,
        msgLen: c,
        textMsg: d
      });
    });
    
    // Result
    console.log(results);

    【讨论】:

      【解决方案4】:

      第一种方法,基于split 使用的groups capturing regex,由辅助函数处理,最后通过reduced 得到预期结果...

      function chunkRight(arr, chunkLength) {
        const list = []; 
        arr = [...arr];
        while (arr.length >= chunkLength) {
          list.unshift(
            arr.splice(-chunkLength)
          );
        }
        return list;
      }
      
      // see also ... [https://regex101.com/r/tatBAB/1]
      // with e.g.
      // (?<inEligType>O9|EO|HJ|J8)(?<msgCode>\w{4})(?<msgLen>\d{4})
      // ... or ...
      // (O9|EO|HJ|J8)(\w{4})(\d{4})
      //
      function extractStatusItems(str, separators) {
        const regXSplit = RegExp(`(${ separators.join('|') })(\\w{4})(\\d{4})`);
      
        const statusValues = String(str).split(regXSplit).slice(1);
        const groupedValues = chunkRight(statusValues, 4);
      
        return groupedValues.reduce((list, [inEligType, msgCode, msgLen, textMsg]) =>
          list.concat({ inEligType, msgCode, msgLen, textMsg }), []
        );
      }
      
      const statusCode = 'O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt';
      
      console.log(
        `statusCode ... ${ statusCode } ...`,
        extractStatusItems(statusCode, ['O9', 'EO', 'HJ', 'J8'])
      );
      .as-console-wrapper { min-height: 100%!important; top: 0; }

      ...其次是第二种方法,几乎​​完全基于regex which captures named groups,由matchAll 使用,最后mapped 进入预期结果...

      // see also ... [https://regex101.com/r/tatBAB/2]
      // with e.g.
      // (?<inEligType>O9|EO|HJ|J8)(?<msgCode>\w{4})(?<msgLen>\d{4})(.*?)(?<textMsg>.*?)(?=O9|EO|HJ|J8|$)
      //
      function extractStatusItems(str, separators) {
        separators = separators.join('|');
      
        const regXCaptureValues = RegExp(
          `(?<inEligType>${ separators })(?<msgCode>\\w{4})(?<msgLen>\\d{4})(.*?)(?<textMsg>.*?)(?=${ separators }|$)`, 
          'g'
        );
        return [
          ...String(str).matchAll(regXCaptureValues)
        ].map(
          ({ groups }) => ({ ...groups })
        );
      }
      
      const statusCode = 'O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt';
      
      console.log(
        `statusCode ... ${ statusCode } ...`,
        extractStatusItems(statusCode, ['O9', 'EO', 'HJ', 'J8'])
      );
      .as-console-wrapper { min-height: 100%!important; top: 0; }

      【讨论】:

        猜你喜欢
        • 2019-09-11
        • 1970-01-01
        • 1970-01-01
        • 2018-12-15
        • 1970-01-01
        • 2011-10-23
        • 2022-01-20
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多