Pandas DataFrame 的多个列表答案

【问题标题】：Multiple lists to Pandas DataFramePandas DataFrame 的多个列表
【发布时间】：2019-05-19 19:34:21
【问题描述】：

我这里有三个列表

[1,2,3,4,5]

[5,4,6,7,2]

[1,2,4,5,6,7,8,9,0]

我想要这样的输出：

A     B    C
1     5    1
2     4    2
3     6    4
4     7    5
5     2    6
           7
           8
           9
           0

我尝试了一种语法，但它给了我这个错误arrays must all be same length，另一个错误是Length of values does not match length of index

有没有办法得到这种输出？

【问题讨论】：

标签： python pandas list dataframe

【解决方案1】：

这不容易支持，但可以做到。 DataFrame.from_dict 将带有“索引”方向。假设您的列表是 A、B 和 C：

pd.DataFrame([A, B, C]).T

     0    1    2
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

另一种选择是使用DataFrame.from_dict：

pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T

     A    B    C
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

zip_longest 和 DataFrame.from_records 的第三种解决方案：

from itertools import zip_longest
pd.DataFrame.from_records(zip_longest(A, B, C), columns=['A', 'B', 'C'])
# pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])

     A    B  C
0  1.0  5.0  1
1  2.0  4.0  2
2  3.0  6.0  4
3  4.0  7.0  5
4  5.0  2.0  6
5  NaN  NaN  7
6  NaN  NaN  8
7  NaN  NaN  9
8  NaN  NaN  0

【讨论】：

有趣的是最后一种方法是最快的，可能是缺少转置去掉了中间结构+1
@EdChum 感谢您的时间安排！很高兴看到你回答，已经投票了:)
@coldspeed 谢谢，暂时没有回答，没有太多有趣和新的问题 IMO，主要是执行清理任务。工作也很忙，偶尔浏览问题总是很有趣
@Coder 我找到了另一个解决方案，我已将其添加到顶部。

【解决方案2】：

另一种方法是对每个列表的Series 执行列表理解并由此构造一个df：

In[61]:
df = pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
df

Out[61]: 
     A    B    C
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

时间：

%timeit pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
%timeit pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T
from itertools import zip_longest
%timeit pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])

1.23 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
977 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
545 µs ± 8.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

所以最后一种方法是最快的

【讨论】：

【解决方案3】：

自定义方式的想法。

定义几个方法来调整输入数据：

def longest(*lists):
  return max([ len(x) for x in lists])

def equalize(col, size):
  delta = size - len(col)
  if delta == 0: return col
  return col + [None for _ in range(delta)]

用于构建数据框：

import pandas as pd

size = longest(col1, col2, col3)
df = pd.DataFrame({'a':equalize(col1, size), 'b':equalize(col2, size), 'c':equalize(col3, size)})

     a    b  c
0  1.0  5.0  1
1  2.0  4.0  2
2  3.0  6.0  4
3  4.0  7.0  5
4  5.0  2.0  6
5  NaN  NaN  7
6  NaN  NaN  8
7  NaN  NaN  9
8  NaN  NaN  0

【讨论】：