【问题标题】:How to create sublists from list based on start and end elements?如何根据开始和结束元素从列表创建子列表?
【发布时间】:2021-09-22 00:05:04
【问题描述】:

尝试根据开始和结束元素从列表创建子列表。我无法获得所有出现的开始和结束元素

lst  = ['value0','<!program start>','value1','value2','<!program end>',
        'value3','<!program start>','value4','<!program end>','value5']

预期输出:

[['value0'],['<!program start>','value1','value2','<!program end>'],
 ['value3'],['<!program start>','value4','<!program end>'],['value5']]

代码:

start_idx = lst.index('<!program start>')
end_idx = lst.index('<!program end>')
final_result = lst[:start_idx] + [lst[start_idx:end_idx+1]] + lst[end_idx+1:]
print(final_result)

【问题讨论】:

    标签: python list loops iteration python-3.6


    【解决方案1】:

    实际上有一些有趣的综合方法利用基本的strlist 处理。

    例如,您可以首先根据开始和结束标签的通用子字符串将lst 分成chunks

    chunks = [s for s in " ".join(lst).split("<!program ")]
    

    这些块固有地包含区分单个元素和标签之间元素的特征。

    列表组合是一种获得所需输出的好方法:

    output = [[s.strip('end> ')] if not s.startswith('start>') else ["<!program start>"] + s.strip("start> ").split() + ["<!program end>"] for s in chunks]
    

    【讨论】:

      【解决方案2】:

      您可以使用相对简单的 FSM (Finite State Machine) 处理数据:

      def fsm(lst):
          result = []
      
          state = 0
          for elem in lst:
              if state == 0:
                  result.append([elem])
                  state = 1
              elif state == 1:
                  if elem == '<!program start>':
                      subl = [elem]
                      state = 2
                  else:
                      break  # End of pattern.
              elif state == 2:
                  subl.append(elem)
                  if elem == '<!program end>':
                      result.append(subl)
                      state = 0
      
          return result
      
      
      lst  = ['value0','<!program start>','value1','value2','<!program end>',
              'value3','<!program start>','value4','<!program end>','value5']
      
      print(fsm(lst))
      
      

      【讨论】:

      • 先生,恐怕这并不能提供所需的输出。
      【解决方案3】:

      嵌套 while 循环的类似解决方案。

      test_list = ['value0','<!program start>','value1','value2','<!program end>',
              'value3','<!program start>','value4','<!program end>','value5']
      
      answer_list = []
      i = 0
      while i < len(test_list):
          if test_list[i] == '<!program start>':
              sublist = []
              while test_list[i] != '<!program end>':
                  sublist.append(test_list[i])
                  i += 1
          elif test_list[i] == '<!program end>':
              sublist.append(test_list[i])
              answer_list.append(sublist)
              i += 1
          else:
              answer_list.append(test_list[i])
              i += 1
      
      print(answer_list)
      

      生产:

      ['value0', ['<!program start>', 'value1', 'value2', '<!program end>'], 'value3', ['<!program start>', 'value4', '<!program end>'], 'value5']
      

      【讨论】:

        【解决方案4】:

        您的代码的问题是 index 返回第一个看到的索引,而不是所有索引。 但可以简单地使用 while 循环来完成。

        final_list = []
        i = 0
        while i < len(lst):
            inner_list = []
            word = lst[i]
            if word == "<!program start>":
                while word != '<!program end>':                  
                    word = lst[i]
                    inner_list.append(word)
                    i += 1    
            else:
                inner_list.append(word)
                i += 1
            final_list.append(inner_list)
        
        print(final_list)
        

        【讨论】:

          【解决方案5】:

          使用迭代:

          lst = ['value0', '<!program start>', 'value1', 'value2', '<!program end>',
                 'value3', '<!program start>', 'value4', '<!program end>', 'value5']
          
          res = []
          start = False
          temp = []
          
          for item in lst:
              if item == '<!program start>':
                  start = True
                  temp.append(item)
          
              elif item == '<!program end>':
                  start = False
                  temp.append(item)
                  res.append(temp)
                  temp = []
          
              elif start:
                  temp.append(item)
              else:
                  res.append([item])
          
          print(res)
          

          输出:

          [['value0'], ['<!program start>', 'value1', 'value2', '<!program end>'], ['value3'], ['<!program start>', 'value4', '<!program end>'], ['value5']]
          

          通过start标志我处理了项目是否在开始和结束标记的中间。

          【讨论】:

            【解决方案6】:

            它不像你的单线那么酷,但它看起来很有效:

            def process(input_list, start, end):
                output = []
                while len(input_list) != 0:
                    if input_list[0] != start:
                        # This isn't a start token, so just add it to the output
                        output.append([input_list[0]])
                        input_list = input_list[1:]
                        continue
            
                    # Looks like we've found a start token, look for the end
                    # associated with it and append that. NOTE: You could
                    # try/except here if you didn't know that the end token was
                    # actually there.
                    end_index = input_list.index(end)
                    output.append(input_list[:end_index + 1])
                    input_list = input_list[end_index + 1:]
                return output
            

            我明白了:

            [['value0'],
             ['<!program start>', 'value1', 'value2', '<!program end>'],
             ['value3'],
             ['<!program start>', 'value4', '<!program end>'],
             ['value5']]
            

            作为我看来正确的输出

            【讨论】:

              猜你喜欢
              • 2019-02-13
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 2013-12-16
              • 1970-01-01
              • 2022-01-10
              相关资源
              最近更新 更多