【发布时间】:2020-10-25 21:48:03
【问题描述】:
我是 python 新手,我被困在这里。我有一个如下所示的数据框,我正在尝试创建一个仅包含 Genres 列的宏类型的新列。
数据框:
import pandas as pd
d = {'Genres': ['Finance', 'Arcade', 'Business', 'Photography', 'Entertainment;Brain Games', 'Medical', 'Tools', 'Casual;Brain Games', 'Medical', 'Entertainment'],
'Last Updated': ['March 10, 2018', 'May 24, 2018', 'April 11, 2018', 'November 6, 2014', 'March 9, 2018', 'May 17, 2018', 'June 3, 2016', 'April 10, 2016', 'July 16, 2018', 'May 17, 2017']}
df = pd.DataFrame(data=d)
df
Genres Last Updated
0 Finance March 10, 2018
1 Arcade May 24, 2018
2 Business April 11, 2018
3 Photography November 6, 2014
4 Entertainment;Brain Games March 9, 2018
5 Medical May 17, 2018
6 Tools June 3, 2016
7 Casual;Brain Games April 10, 2016
8 Medical July 16, 2018
9 Entertainment May 17, 2017
所需的输出类似于:
Genres macro_genres Last Updated
0 Finance Finance March 10, 2018
1 Arcade Arcade May 24, 2018
2 Business Business April 11, 2018
3 Photography Photography November 6, 2014
4 Entertainment;Brain Games Entertainment March 9, 2018
5 Medical Medical May 17, 2018
6 Tools Tools June 3, 2016
7 Casual;Brain Games Casual April 10, 2016
8 Medical Medical July 16, 2018
9 Entertainment Entertainment May 17, 2017
我尝试过的:
def macro_genre(i):
for i in df['Genres']:
if ';' in i:
j = i.split(';')[0]
return j
else:
return i
df['macro_genres'] = df['Genres'].apply(macro_genre)
但它不起作用。它会创建列,但会重复整个列的第一个值。
当我在函数外尝试for 部分时,它可以工作。
任何提示将不胜感激!谢谢!!!
【问题讨论】:
-
很抱歉。我是新来的。你说哪一部分应该是最小可重复性的,数据框本身?我输入它是因为它只是较大数据框的一小部分。
-
你说哪一部分应该是最小可重现的,数据框本身?应该可以复制/粘贴你的代码和数据,并且能够立即运行代码.
-
感谢 AMC 的提示。虽然已经给出了解决方案,但我已经包含了生成数据帧的代码。