【问题标题】:convert list elements into list of tuples将列表元素转换为元组列表
【发布时间】:2018-07-20 14:15:08
【问题描述】:
header =  ['chr', 'pos', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']

我想将上述列表元素转换为元组列表。喜欢:

sample_list = [('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'),
              'ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]

我认为 lambda 或列表推导可用于以简短而全面的方式解决此问题。

sample_list = [lambda (x,y): x = a if '_PI' in a for a in header ..]

或者,

[(x, y) if '_PI' and '_PG_al' in a for a in header]

有什么建议吗?

【问题讨论】:

  • 似乎你想要成对的连续元素。如果是这样,这是zip() 的完美用例:首先删除前两个元素:header = header[2:],然后执行zip(header[::2], header[1::2])。另请参阅:Understanding python's slice notation.

标签: python list lambda tuples list-comprehension


【解决方案1】:

您可以过滤列表并删除所有与所需分组模式不匹配的元素:

import re
import itertools
header =  ['chr', 'pos', 'ms01e', 'ms01e_PG_al', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']
new_headers = list(filter(lambda x:re.findall('^[a-zA-Z]+_[a-zA-Z]+|[a-zA-Z]+\d+[a-zA-Z]+', x), header))
final_data = [(new_headers[i], new_headers[i+1]) for i in range(0, len(new_headers), 2)]

输出:

[('ms01e', 'ms01e_PG_al'), ('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'), ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]

【讨论】:

  • 我完全喜欢这个主意。我接触过list comprehension,但完全错过了我可以跳两步来解决问题。另外,如果出现这种情况,有没有办法通过在 _ 之前匹配 sample name 来做到这一点。?
  • @everestial007 谢谢。但是,我有点困惑。 sample name 是什么意思?
  • 样本名称为ms01e, ms02g...,每个样本有两个值ms01e_PI, ms01e_PG_al。因此,在删除 chr and pos 之后,所有样本都应该有成对的值。我在想是否有一种方法可以通过匹配sample name before underscore _ 然后配对为PI and PG after underscore _ 从数据中挖掘唯一样本。最终输出是相同的list of tuples,但方式不同。如果数据不按名称排序,这将很重要。
  • @everestial007 请查看我最近的编辑。我修改了解决方案,以便header 类似于ms01e 的元素将包含在filter 函数中。
  • 需要更改的代码部分是 final_data = [(new_headers[i], new_headers[i+1]) for i in range(0, len(new_headers), 2)] ,因为如果列表无序会产生问题。因此,在_ 之前找到唯一名称,然后更新[()] 会是更好的方法。
【解决方案2】:

试试这个:

list = ['chr', 'pos', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']


def l_tuple(list):
    list = filter(lambda x: "PI" in x or "PG" in x, list)
    l = sorted(list, key=lambda x: len(x) and x[:4])
    return [(l[i], l[i + 1]) for i in range(0, len(l), 2)]

print(l_tuple(list))

输出

[('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'), ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]

【讨论】:

    【解决方案3】:

    我担心输入 header 可能没有按顺序/组织的样本(PI 和 PG 值)。我认为最好先挖掘样本名称,然后按以下方式创建list of tuples

    header =  ['chr', 'pos', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']
    
    ''' Keep the names of all the samples, after removing chr, pos and
    also remove the other suffixes after the underscore(_). '''
    samples = [x.split('_')[0] for x in header if '_' in x]
    
    ''' Now, create the reduced list (basically a set). But, if order is of 
    interest it can be preserved using this method. '''
    
    ''' Create an empty set '''
    seen = set()
    sample_set = [x for x in samples02 if not (x in seen or seen.add(x))]
    
    ''' Now, create the tuples of list ''' 
    sample_list = [((x + '_PI'), (x + '_PG_al')) for x in sample_set]
    print('sample list: ', sample_list)
    
    sample list:  [('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'), ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]
    

    【讨论】:

      【解决方案4】:

      这是一种方式:

      # first, filter and sort
      header = sorted(i for i in header if any(k in i for k in ('_PI', '_PG_al')))
      
      # second, zip and order by suffix
      header = [(x, y) if '_PI' in x else (y, x) for x, y in zip(header[::2], header[1::2])]
      
      # [('ms01e_PI', 'ms01e_PG_al'),
      #  ('ms02g_PI', 'ms02g_PG_al'),
      #  ('ms03g_PI', 'ms03g_PG_al'),
      #  ('ms04h_PI', 'ms04h_PG_al')]
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2014-07-29
        • 2015-02-13
        • 2016-09-28
        • 2012-06-05
        • 1970-01-01
        相关资源
        最近更新 更多