【问题标题】:Efficient string manipulation in JavascriptJavascript中的高效字符串操作
【发布时间】:2019-04-16 03:28:59
【问题描述】:

我有一个字符串(HTML 内容)和一个位置(索引)对象数组。 字符串长度约160万个字符,位置对象约700个。

即:

var content = "<html><body><div class="c1">this is some text</div>...."
var positions = [{start: 20, end: 25}, {start: 35, end: 37}....]

我必须在字符串中的每个开始位置插入一个开始跨度标记,在字符串中的每个结束位置插入一个闭合跨度标记。

最有效的方法是什么?

到目前为止,我已经尝试对位置数组进行反向排序,然后循环遍历,然后使用替换/拼接插入标签,例如:

content = content.slice(0, endPosition) + "</span>" + content.substring(endPosition);
content = content.slice(0, startPosition) + "<span>" + content.slice(startPosition);

(请注意我是如何从末尾开始循环的,以避免弄乱开始/结束位置)。

但这需要大约 3 秒,这对我来说似乎很慢而且效率低。

有什么更有效的方法来做到这一点?

【问题讨论】:

  • 位置是指线吗?还是字符串中的索引?
  • 位置表示索引

标签: javascript string loops


【解决方案1】:

不要每次都修改大字符串,而是尝试在新缓冲区中累积处理过的“块”:

content = '0123456789'
positions = [
  [1, 3],
  [5, 7]
]

buf = []
lastPos = 0

for (let [s, e] of positions) {
  buf.push(
    content.slice(lastPos, s),
    '<SPAN>',
    content.slice(s, e),
    '</SPAN>'
  )
  lastPos = e
}

buf.push(content.slice(lastPos))


res = buf.join('')
console.log(res)

【讨论】:

  • 我认为你必须反转位置数组和位置循环,否则添加跨度标签会改变内容位置,它们将不再处于正确的位置
  • @joshuamiller:我不这么认为,原字符串保持不变,没有移位
  • 是的,错过了那部分
【解决方案2】:

我们可以将content 按字符拆分为数组,而不是一个循环插入&lt;span&gt; &lt;/span&gt;join 回到字符串

var content = '<html><body><div class="c1">this is some text</div>....';
var positions = [{start: 20, end: 25}, {start: 35, end: 37}];
var arr = content.split('');

var arrPositions = {
  starts: positions.map(_ => _.start),
  ends: positions.map(_ => _.end)
}

var result = arr.map((char, i) => {
  if (arrPositions.starts.indexOf(i) > -1) {
    return '<span>' + char;
  }
  if (arrPositions.ends.indexOf(i) > -1) {
    return '</span>' + char;
  }
  return char
}).join('')

console.log(result)

【讨论】:

    【解决方案3】:

    你可以这样做:

    const content = '<div class="c1">It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using Content here, content here, making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for lorem ipsum will uncover many web sites still in their infancy. Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like).</div>';
    const positions = [{start: 24,end: 40}, {start: 160,end: 202}];
    const result = positions
      .reduce((a, c, i, loopArray) => {
        a.array.push(
          content.slice(a.lastPosition, c.start), '<span class="blue">', content.slice(c.start, c.end), '</span>'
        );
        
        a.lastPosition = c.end;
        
        if (loopArray.length === ++i) {
          a.array.push(content.slice(a.lastPosition));
        }
        
        return a;
      }, {array: [], lastPosition: 0})
      .array
      .join('');
    
    document.write(result);
    .blue {color: blue;}

    【讨论】:

      【解决方案4】:

      你可以这样做:

      const content = 'this is some text. this is some text. this is some text. this is some text. this is some text. this is some text. this is some text. this is some text. ';
      const positions = [{start: 20, end: 26}, {start: 35, end: 37}];
      
      // using Set will help in reducing duplicate position elements.
      let starts = new Set();
      let ends = new Set();
      
      const START_TAG = '<span>';
      const END_TAG = '</span>';
      
      const string_length = content.length;
      
      positions.forEach(function(position) {
         let _start = position.start;
         let _end = position.end;
      
         // check whether index positions are in-bound.
         if(_start > -1 && _start < string_length) starts.add(_start);
         if(_end > -1 && _end < string_length) ends.add(_end);
      });
      
      updated_string = content;
      
      starts.forEach(function(position) {
        updated_string = updated_string.substr(0, position) + START_TAG + updated_string.substr(position);
      });
      
      ends.forEach(function(position) {
        updated_string = updated_string.substr(0, position) + END_TAG + updated_string.substr(position);
      });
      
      console.log(updated_string);
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2010-09-27
        • 1970-01-01
        • 2021-08-27
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多