在每第 n 列之后向数据框添加一列答案

【问题标题】：Adding a column to a dataframe after every nth column在每第 n 列之后向数据框添加一列
【发布时间】：2019-08-14 23:35:28
【问题描述】：

我有一个包含 9,000 列和 100 行的数据框。我想在每 3 列之后插入一列，使其值对于所有行都等于 50。

现有数据帧

  0 1 2 3 4 5 6 7 8 9....9000
0 a b c d e f g h i j ....x
1 k l m n o p q r s t ....x
.
.

100 u v w x y z aa bb cc....x

所需的数据帧

  0 1 2 3 4 5 6 7 8 9....12000
0 a b c 50 d e f  50 g h i j ....x
1 k l m 50 n o p  50 q r s t ....x
.
.
100 u v w 50 x y z 50 aa bb cc....x

【问题讨论】：

标签： python pandas insert

【解决方案1】：

通过索引每个3rd 列创建新的DataFrame，添加.5 以进行正确排序并使用concat 添加到原始：

df.columns = np.arange(len(df.columns))

df1 = pd.DataFrame(50, index=df.index, columns= df.columns[2::3] + .5)

df2 = pd.concat([df, df1], axis=1).sort_index(axis=1)
df2.columns = np.arange(len(df2.columns))
print (df2)
  0  1  2   3  4  5  6   7  8  9  10  11 12
0  a  b  c  50  d  e  f  50  g  h  i  50  j
1  k  l  m  50  n  o  p  50  q  r  s  50  t

【讨论】：

@ash90 - 那么df1 = pd.DataFrame(50, index=df.index, columns= df.columns[df.columns % 3 == 2] + .5) 呢？
或者在解决方案之前可能需要通过df.columns = np.arange(len(df.columns))重置列
发布的答案在测试 df 上对我有用。不过，我不明白为什么要在列中添加 0.5。无论哪种方式，我都得到相同的解决方案
@Brennan - 是的，这只是为了 100% 确定，因为 sort_index 中的默认方法是 quicksort，如果更改为 sort_index(kind='mergesort')，我认为它应该被删除
@Brennan - . mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label. - DataFrame.sort_index

【解决方案2】：

麻木

# How many columns to group
x    = 3
# Get the shape of things
a    = df.to_numpy()
m, n = a.shape
k    = n // x
# Get only a multiple of x columns and reshape
b    = a[:, :k * x].reshape(m, k, x)
# Get the other columns missed by b
c    = a[:, k * x:]
# array of 50's that we'll append to the last dimension
_50  = np.ones((m, k, 1), np.int64) * 50
# append 50's and reshape back to 2D
d    = np.append(b, _50, axis=2).reshape(m, k * (x + 1))
# Create DataFrame while appending the missing bit
pd.DataFrame(np.append(d, c, axis=1))

   0  1  2   3  4  5  6   7  8  9 10  11 12
0  a  b  c  50  d  e  f  50  g  h  i  50  j
1  k  l  m  50  n  o  p  50  q  r  s  50  t

设置

df = pd.DataFrame(np.reshape([*'abcdefghijklmnopqrst'], (2, -1)))

【讨论】：

【解决方案3】：

所以这里有一个解决方案

s=pd.concat([y.assign(new=50) for x, y in df.groupby(np.arange(df.shape[1])//3,axis=1)],axis=1)
s.columns=np.arange(s.shape[1])

【讨论】：

这似乎不起作用。 ValueError: Grouper and axis must be same length。我不明白数组分组的作用..
这里和@Brennan 一样
虽然难以理解，但效果很好:) -@WeNYoBen