使用 Pandas 拆分数字并为每个数字拆分创建新的单独列答案

【问题标题】：Splitting a number and creating new individual columns for each number split using Pandas使用 Pandas 拆分数字并为每个数字拆分创建新的单独列
【发布时间】：2018-06-13 08:18:31
【问题描述】：

早安，

我有一个这样的列的数据框，让我们假设有 1000 行，但这里是一个示例：

我希望将号码分成两个单独的号码。我希望输出看起来像这样：

 A    B    C
12    1    2
24    2    4
36    3    6
48    4    8

如何使用 Pandas 和 Numpy 实现这一目标？帮助将不胜感激。提前致谢！

【问题讨论】：

标签： python pandas numpy dataframe data-structures

【解决方案1】：

使用floor 和mod：

df['B'] = df['A'] // 10
df['C'] = df['A'] % 10

print (df)
    A  B  C
0  12  1  2
1  24  2  4
2  36  3  6
3  48  4  8

如果输入数据是字符串，则可以按[]的位置索引：

print (df['A'].apply(type))
0    <class 'str'>
1    <class 'str'>
2    <class 'str'>
3    <class 'str'>
Name: A, dtype: object

df['B'] = df['A'].str[0]
df['C'] = df['A'].str[1]
#if necessary convert all columns to integers
df = df.astype(int)
print (df)
    A  B  C
0  12  1  2
1  24  2  4
2  36  3  6
3  48  4  8

【讨论】：

如果数字是随机的并且不能被 12 整除，这还能用吗？ @jezrael
@DeepakM - 嗯。使用长度为2 的数字的解决方案，否则将487 拆分为48 和7。这取决于需要什么。
当我运行它时，我得到了这个错误。 TypeError: unsupported operand type(s) for //: 'str' and 'int'@jazrarel
@DeepakM - 表示有字符串，先使用df['A'].astype(int) // 10和df['A'].astype(int) % 10
啊，好吧，我意识到里面有`NaN。它有效，但有没有办法让它否定NaN。那样就好了！ @jezrael

【解决方案2】：

对于这样大小的 df，请使用 floordiv 和 mod：

In[141]:
df['B'] = df['A'].floordiv(10)
df['C'] = df['A'].mod(10)
df

Out[141]: 
    A  B  C
0  12  1  2
1  24  2  4
2  36  3  6
3  48  4  8

还有 numpy 等价物，np.floor_divide 和 np.mod：

In[142]:
df['B'] = np.floor_divide(df['A'],10)
df['C'] = np.mod(df['A'],10)
df

Out[142]: 
    A  B  C
0  12  1  2
1  24  2  4
2  36  3  6
3  48  4  8

numpy 版本更快：

%%timeit
df['B'] = df['A'].floordiv(10)
df['C']= df['A'].mod(10)
1000 loops, best of 3: 733 µs per loop

%%timeit
df['B'] = np.floor_divide(df['A'],10)
df['C'] = np.mod(df['A'],10)

1000 loops, best of 3: 491 µs per loop

【讨论】：

【解决方案3】：

In [15]: df.A.astype(str).str.extractall(r'(.)')[0].unstack().astype(np.int8)
Out[15]:
match  0  1
0      1  2
1      2  4
2      3  6
3      4  8

【讨论】：

【解决方案4】：

另一种基于将数字的每个字符拆分为字符串的方法：

df = pd.DataFrame([12, 24, 36, 48], columns=['A'])

values = df['A'].values
split = [list(str(el)) for el in values]

out = pd.DataFrame(split, columns=['B', 'C']).astype(int)

给出：

【讨论】：