将列表元素转换为元组列表答案

【问题标题】：convert list elements into list of tuples将列表元素转换为元组列表
【发布时间】：2018-07-20 14:15:08
【问题描述】：

header =  ['chr', 'pos', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']

我想将上述列表元素转换为元组列表。喜欢：

sample_list = [('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'),
              'ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]

我认为 lambda 或列表推导可用于以简短而全面的方式解决此问题。

sample_list = [lambda (x,y): x = a if '_PI' in a for a in header ..]

或者，

[(x, y) if '_PI' and '_PG_al' in a for a in header]

有什么建议吗？

【问题讨论】：

似乎你想要成对的连续元素。如果是这样，这是zip() 的完美用例：首先删除前两个元素：header = header[2:]，然后执行zip(header[::2], header[1::2])。另请参阅：Understanding python's slice notation.

标签： python list lambda tuples list-comprehension

【解决方案1】：

您可以过滤列表并删除所有与所需分组模式不匹配的元素：

import re
import itertools
header =  ['chr', 'pos', 'ms01e', 'ms01e_PG_al', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']
new_headers = list(filter(lambda x:re.findall('^[a-zA-Z]+_[a-zA-Z]+|[a-zA-Z]+\d+[a-zA-Z]+', x), header))
final_data = [(new_headers[i], new_headers[i+1]) for i in range(0, len(new_headers), 2)]

输出：

[('ms01e', 'ms01e_PG_al'), ('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'), ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]

【讨论】：

我完全喜欢这个主意。我接触过list comprehension，但完全错过了我可以跳两步来解决问题。另外，如果出现这种情况，有没有办法通过在 _ 之前匹配 sample name 来做到这一点。？
@everestial007 谢谢。但是，我有点困惑。 sample name 是什么意思？
样本名称为ms01e, ms02g...，每个样本有两个值ms01e_PI, ms01e_PG_al。因此，在删除 chr and pos 之后，所有样本都应该有成对的值。我在想是否有一种方法可以通过匹配sample name before underscore _ 然后配对为PI and PG after underscore _ 从数据中挖掘唯一样本。最终输出是相同的list of tuples，但方式不同。如果数据不按名称排序，这将很重要。
@everestial007 请查看我最近的编辑。我修改了解决方案，以便header 类似于ms01e 的元素将包含在filter 函数中。
需要更改的代码部分是 final_data = [(new_headers[i], new_headers[i+1]) for i in range(0, len(new_headers), 2)] ，因为如果列表无序会产生问题。因此，在_ 之前找到唯一名称，然后更新[()] 会是更好的方法。

【解决方案2】：

试试这个：

list = ['chr', 'pos', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']


def l_tuple(list):
    list = filter(lambda x: "PI" in x or "PG" in x, list)
    l = sorted(list, key=lambda x: len(x) and x[:4])
    return [(l[i], l[i + 1]) for i in range(0, len(l), 2)]

print(l_tuple(list))

输出

[('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'), ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]

【讨论】：

【解决方案3】：

我担心输入 header 可能没有按顺序/组织的样本（PI 和 PG 值）。我认为最好先挖掘样本名称，然后按以下方式创建list of tuples。

header =  ['chr', 'pos', 'ms01e_PI', 'ms01e_PG_al', 'ms02g_PI', 'ms02g_PG_al', 'ms03g_PI', 'ms03g_PG_al', 'ms04h_PI', 'ms04h_PG_al']

''' Keep the names of all the samples, after removing chr, pos and
also remove the other suffixes after the underscore(_). '''
samples = [x.split('_')[0] for x in header if '_' in x]

''' Now, create the reduced list (basically a set). But, if order is of 
interest it can be preserved using this method. '''

''' Create an empty set '''
seen = set()
sample_set = [x for x in samples02 if not (x in seen or seen.add(x))]

''' Now, create the tuples of list ''' 
sample_list = [((x + '_PI'), (x + '_PG_al')) for x in sample_set]
print('sample list: ', sample_list)

sample list:  [('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'), ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al')]

【讨论】：

【解决方案4】：

这是一种方式：

# first, filter and sort
header = sorted(i for i in header if any(k in i for k in ('_PI', '_PG_al')))

# second, zip and order by suffix
header = [(x, y) if '_PI' in x else (y, x) for x, y in zip(header[::2], header[1::2])]

# [('ms01e_PI', 'ms01e_PG_al'),
#  ('ms02g_PI', 'ms02g_PG_al'),
#  ('ms03g_PI', 'ms03g_PG_al'),
#  ('ms04h_PI', 'ms04h_PG_al')]

【讨论】：