将两个列表合并为一个 [关闭]答案

【问题标题】：JOINning two List-of-Lists to one [closed]将两个列表合并为一个 [关闭]
【发布时间】：2022-01-26 01:21:58
【问题描述】：

摆在我面前的问题是连接两个数组，类似于 SQL，其中一个“键”由两列 YEAR 和 MONTH 组成。这两个数组代表收入（每年和每月）以及同样的费用。我想加入它们，使用键，生成另一个包含四列的数组：YEAR、MONTH、INCOME、EXPENSE。

我拥有的两个数组是：

income = [["2019","Jan.", 2000],
          ["2019","Feb.", 1500],
          [ ---- , ---  , --- ],
          ["2019","Dec.", 1200],
          ["2020","Jan.", 1400],
          [ ---- , ---  , --- ],
          ["2020","Dec.", 1300]]

Expenses = [["2019","Jan.", 1800],
            ["2019","Feb.", 1400],
            [ ---- , ---  , --- ],
            ["2019","Dec.", 1100],
            ["2020","Jan.", 1300],
            [ ---- , ---  , --- ],
            ["2020","Dec.", 1200]]

而想要的结果是：

Joined =   [["2019","Jan.", 2000, 1800],
            ["2019","Feb.", 1500, 1400],
            [ ---- , ---  , ---   ----],
            ["2019","Dec.", 1200, 1100],
            ["2020","Jan.", 1400, 1300],
            [ ---- , ---  , ---   ----],
            ["2020","Dec.", 1300, 1200]]

我该怎么办？列表理解？循环？ pythonic的方式是什么？

【问题讨论】：

随心所欲。这将很难用列表推导来完成，因为列表推导用于映射/过滤操作，进行连接需要非常低效的算法，相反，您应该使用 dict 作为索引。通常，如果您考虑连接，那么list 是错误的数据结构。 “pythonic”的方式是使用更合适的数据结构（或者可能只是使用数据库，Python 已经与 sqlite 一起分发）
您应该查看 Pandas 库。它使处理这样的数据变得轻而易举。
另外，您确实必须提供minimal reproducible example。您提供的示例引发了SyntaxError。 [ ---- , --- , --- ] 应该是什么？
它们是否总是相同的位置，或者收入列表第1行可能是2020年，费用第1行可能是2019年，您需要相应地匹配它们吗？
@juanpa.arrivillaga：[ ---- , --- , --- ] 只是Dito 的一种。实际的数组是两年或更长时间，每一个都是 12 个月。只需删除这些行，您就会得到两个可执行的输入数组。

标签： python python-3.x inner-join

【解决方案1】：

没有熊猫：

import operator
import itertools

def join(*lists, exclude_positions=()):
    """Join list rows
    
    Example:
        >>> list_a = [["2019","Jan.", 2000],
                      ["2019","Feb.", 1500]]
        >>> list_b = [["2019","Jan.", 1800],
                      ["2019","Feb.", 1400]]
        >>> join(list_a, list_b, exclude_positions=(0,1))
        [["2019","Jan.", 2000, 1800],
        ["2019","Feb.", 1500, 1400]]

    Args:
        *lists: lists to join
        exclude_positions: positions to exclude from merging. The equivalent
        positions from the first list will be used.
    """
    lists_length = len(lists[0])

    if lists_length == 0:
        return []

    if not all(len(l) == lists_length for l in lists):
        raise ValueError("Lists must have the same length")

    iterators = []
    iterators.append(iter(lists[0]))
    
    for l in lists[1:]:
        columns_taken = [i for i in range(len(l[0])) if i not in exclude_positions]
        if len(columns_taken) == 0:
            continue
        iterator = map(operator.itemgetter(*columns_taken), l)
        if len(columns_taken) == 1:
            iterator = ((i,) for i in iterator)
    
        iterators.append(iterator)


    return [list(itertools.chain.from_iterable(row))  for row in zip(*iterators)]

对于 4、120 和 1200 行检查，我的解决方案比 pandas 解决方案快 100 到 500 倍：

py -m timeit -s "import temp2" "temp2.join(temp2.income, temp2.Expenses, exclude_positions=(0,1))"
10000 loops, best of 5: 24.2 usec per loop

py -m timeit -s "import pandas as pd; import temp2" "pd.DataFrame(temp2.income, columns=['Year', 'Month', 'X']).merge(pd.DataFrame(temp2.Expenses, columns=['Year', 'Month', 'Y']), on=['Year', 'Month']).values.tolist()"
50 loops, best of 5: 8.42 msec per loop

这是因为我正在使用高效的 C 级迭代器和函数，不创建中间数据类型，并且不是按键匹配，而是根据您的评论按行号匹配。我也不需要标准库之外的任何模块。

使用 pandas 的更好解决方案是在不查找的情况下正常合并行：

a = pd.DataFrame(temp2.income, columns=['Year', 'Month', 'X'])
b = pd.DataFrame(temp2.Expenses, columns=['Year', 'Month', 'Y'])
result = pd.DataFrame({'Year': a['Year'], 'Month': a['Month'], 'X': a['X'], 'Y': b['Y']}).values.tolist()

我的解决方案仍然比这快 3 倍，但 pandas 简短而简洁。

【讨论】：

【解决方案2】：

只需使用Pandas 将您的列表（收入和费用）转换为数据框，合并它们（在这种情况下，它基本上是年和月的内部连接）然后将您获得的 Dataframe 转换为列表列表。

df1 = pd.DataFrame(income, columns=["Year", "Month", "X"])
df2 = pd.DataFrame(Expenses, columns=["Year", "Month", "Y"])
joined = df1.merge(df2, on=["Year", "Month"]).values.tolist()

输出：

[['2019', 'Jan.', 2000, 1800], ['2019', 'Feb.', 1500, 1400], ['2019', 'Dec.', 1200, 1100], ['2020', 'Jan.', 1400, 1300], ['2020', 'Dec.', 1300, 1200]]

PS：如果您想知道为什么它们不在输出中，我从两个列表中删除了所有 [ ---- , --- , --- ]。

【讨论】：

谢谢！我不是一个经验丰富的 pythonista，不到一年的经验，Pandas 不在我的工具箱中！我想它会是现在......再次感谢！
而产生天平就像减去两列，另一行一样简单！ ` df_joined['balance'] = df_joined['incomes'] - df_joined['expenses']`