如何在 Python 中将 DataFrame 列转换为行？答案

【问题标题】：How to convert DataFrame column to Rows in Python?如何在 Python 中将 DataFrame 列转换为行？
【发布时间】：2020-10-12 15:27:36
【问题描述】：

我在df_1 中有以下数据集，我想将其转换为df_2 的格式。在df_2 中，我已将df_1 的列转换为df_2 中的行（不包括UserId 和Date）。我查找了类似的答案，但他们提供的解决方案并不复杂。有没有简单的方法来做到这一点？

df_1

   UserId       Date                   -7  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5   6   7
    87      2011-05-10 18:38:55.030     0   0   0   0   0   0   1   0   0   0   0   0   0   0   0
    487     2011-11-29 14:46:12.080     0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
    21      2012-03-02 14:35:06.867     0   1   0   1   2   0   2   2   0   1   2   2   1   3   1

df_2

day | count
-7   0
-7   0
-7   0
-6   0
-6   0
-6   1
-5   0
-5   1
-5   0 
.    .
.    .(Similarly for other columns in between)
.    .
6   0    
6   0
6   3
7   0
7   0
7   1

【问题讨论】：

标签： python python-3.x pandas dataframe pandas-groupby

【解决方案1】：

Pandas 提供了一个默认方法df.melt() 正是为了这个目的，它是df.pivot() 或df.pivot_table() 的反向操作。（不知道为什么函数名不是更直观的unpivot）。

此方案的优点：

没有重新发明轮子。一个易于理解且普遍适用的df.transpose() -> df.melt() 逻辑。
避免了列连接和/或附加数据集。

代码

# 1. preparation: get the "day" column in place.
# Note: The column names were strings ('-7', '-6', ...) as copy-pasted.
col_names = [str(i) for i in range(-7, 8)]
df_tr = df_1[col_names].transpose().reset_index()
df_tr.rename(columns={"index": "day"}, inplace=True)
df_tr["day"] = df_tr["day"].astype(int)  # str to int

# 2. unpivoting (melting)
df_2_unpivot = df_tr.melt(id_vars="day", var_name="col", value_name="count")
df_2 = df_2_unpivot.sort_values(by=["day", "col"])

# 3.cleanup
del df_2["col"]
df_2.reset_index(drop=True, inplace=True)

结果

df_2
Out[134]: 
    day  count
0    -7      0
1    -7      0
2    -7      0
3    -6      0
4    -6      0
5    -6      1
6    -5      0
7    -5      1
8    -5      0
9    -4      0
10   -4      0
11   -4      1
12   -3      0
13   -3      0
14   -3      2
15   -2      0
16   -2      0
17   -2      0
18   -1      1
19   -1      0
20   -1      2
21    0      0
22    0      0
23    0      2
24    1      0
25    1      0
26    1      0
27    2      0
28    2      0
29    2      1
30    3      0
31    3      0
32    3      2
33    4      0
34    4      0
35    4      2
36    5      0
37    5      0
38    5      1
39    6      0
40    6      0
41    6      3
42    7      0
43    7      0
44    7      1

还可以查看中间数据集并自己使用选项。

【讨论】：

【解决方案2】：

您可以使用 apply 并连接所有行并对它们进行排序-

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.random((3, 10)), columns=range(10))
df = df.T
new_df = pd.Series([], dtype=np.float64)


def f(x):
    global new_df  # not the most elegant way, something you could work upon?
    new_df = pd.Series.append(new_df, x)


df.apply(f, axis=0)
new_df.sort_index(inplace=True)
print(new_df)

0    0.020673
0    0.710004
0    0.590984
1    0.643964
1    0.719694
1    0.105075
2    0.270417
2    0.537349
2    0.610228
3    0.391562
3    0.760375
3    0.105794
4    0.726044
4    0.676487
4    0.851921
5    0.447779
5    0.798975
5    0.877853
6    0.807380
6    0.639440
6    0.435890
7    0.263091
7    0.722340
7    0.586944
8    0.142973
8    0.928533
8    0.438123
9    0.076326
9    0.385373
9    0.662350
dtype: float64

【讨论】：

【解决方案3】：

这是你想要的吗 (transpose())？

import pandas as pd
from io import StringIO

# Prework to generate your data
data = """UserId       Date                   -7  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5   6   7
    87      2011-05-10 18:38:55.030     0   0   0   0   0   0   1   0   0   0   0   0   0   0   0
    487     2011-11-29 14:46:12.080     0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
    21      2012-03-02 14:35:06.867     0   1   0   1   2   0   2   2   0   1   2   2   1   3"""

input_data = StringIO(data)
df_1 = pd.read_table(input_data, sep=r"\s{2,}", engine="python")

# remove unused columns
df_1.drop(["Date", "UserId"], axis=1, inplace=True)

# # and transpose
df_2 = df_1.transpose()

# concat all lines
df_2 = df_2[0].append(df_2[1]).append(df_2[2])
df_2.sort_index(inplace=True)

print(df_2)

输出：

【讨论】：

不，我不想要转置。获取的数据框必须只有 2 列。
我在转置后变成了 2 col 表