在 pandas 中使用 str 拆分答案

【问题标题】：Using str in split in pandas在 pandas 中使用 str 拆分
【发布时间】：2019-01-25 11:21:06
【问题描述】：

这是我为我的问题创建的一些虚拟数据。我对此有两个问题：

为什么split 在查询的第一部分使用str 而不是在第二部分工作？
[0] 怎么会选择第 1 部分的第一行和第 2 部分中每行的第一个元素？

chess_data = pd.DataFrame({"winner": ['A:1','A:2','A:3','A:4','B:1','B:2']})

chess_data.winner.str.split(":")[0]
['A', '1']

chess_data.winner.map(lambda n: n.split(":")[0])
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

【问题讨论】：

标签： python string pandas dataframe split

【解决方案1】：

chess_data 是一个数据框
chess_data.winner是一个系列
chess_data.winner.str 是特定于字符串且经过优化（在一定程度上）的方法的访问器
chess_data.winner.str.split 就是这样一种方法
chess_data.winner.map 是一种不同的方法，它接受一个字典或一个可调用对象，并且可以使用系列中的每个元素调用该可调用对象或在系列的每个元素上调用字典get 方法。李>

在使用chess_data.winner.str.split 的情况下，Pandas 会执行一个循环并执行一种str.split。而map 是做同样事情的更粗略的方式。

使用您的数据。

chess_data.winner.str.split(':')

0    [A, 1]
1    [A, 2]
2    [A, 3]
3    [A, 4]
4    [B, 1]
5    [B, 2]
Name: winner, dtype: object

为了获取每个第一个元素，您需要再次使用字符串访问器

chess_data.winner.str.split(':').str[0]

0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

这是执行您在 map 中所做操作的等效方式

chess_data.winner.map(lambda x: x.split(':')[0])

你也可以使用理解

chess_data.assign(new_col=[x.split(':')[0] for x in chess_data.winner])

  winner new_col
0    A:1       A
1    A:2       A
2    A:3       A
3    A:4       A
4    B:1       B
5    B:2       B

【讨论】：

【解决方案2】：

您的代码，

chess_data['winner'].str.split(':')[0] 
['A', '1']

同，

chess_data['winner'].str.split(':').loc[0] 
['A', '1']

还有，

chess_data['winner'].map(lambda n: n.split(':')[0])
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

同，

chess_data.winner.str.split(':').str[0]
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

也一样，

pd.Series([x.split(':')[0] for x in chess_data['winner']], name='winner') 
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

【讨论】：

【解决方案3】：

在Indexing using str下的文档中有说明

.str[index] 表示法按位置索引字符串，其中 [index] 将根据系列的索引进行切片。

使用示例

s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan,'CABA', 'dog', 'cat'])

s.str[3]

返回每行索引 3 处的元素

0    NaN
1    NaN
2    NaN
3      a
4      a
5    NaN
6      A
7    NaN
8    NaN

而

s[3]

'Aaba'

【讨论】：

【解决方案4】：

使用 apply 方法从拆分后的 Series 中提取第一个值

chess_data.winner.str.split(':')
Out: 
0    [A, 1]
1    [A, 2]
2    [A, 3]
3    [A, 4]
4    [B, 1]
5    [B, 2]
Name: winner, dtype: object

chess_data.winner.str.split(':').apply(lambda x: x[0])
Out:
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

当你使用时

chess_data.winner.str.split(":")[0]

您只需从结果系列中获得第一件物品。但是 .apply() 应用一些函数，在本例中为“itemgetter”，应用于系列中的所有值并返回另一个系列。

【讨论】：