比较和删除数据框上的列答案

【问题标题】：compare and delete columns on a dataframe比较和删除数据框上的列
【发布时间】：2021-04-18 15:28:30
【问题描述】：

check here to see the df picture

Python：

我有一个数据框，其中复制了一些流派列。我想混合具有相似类型的列，如果它们具有“1”值，请保留该值。

例如，0genero_adventure 的值为“0”，1genero_adventure 的值为“1”，所以我想保留“1”。

不仅针对这些示例，还针对整个表格（继续使用更多重复的流派列）

提前致谢:)

【问题讨论】：

请不要发布您的数据框的图片。相反，包含可重现的 python 代码来生成数据帧。

标签： python data-science data-cleaning

【解决方案1】：

如果我正确理解了您的问题，我认为下面的代码应该非常适合您。但是，您需要创建一个包含流派名称的列表。

genre_list = ["genero_Adventure", "genero_Biography", "genero_Comedy"]  #Add all the genre names like this

那么这个循环应该可以完成你的工作：

for genre in genre_list:
   genre_cols_list = []
   genre_cols_list = [col for col in df.columns if genre in col]    #Creates a list containing all the columns with the genre name

   df[genre] = df[genre_cols_list].max(axis= 1)   #Checks if there is a value of 1 at the row level and stores it in a new column with just the genre name
   df.drop(columns = genre_cols_list, axis = 1, inplace = True)   #Deletes all columns with the genre name

【讨论】：

【解决方案2】：

我会存储流派，循环遍历它们，如果其中一列是 1，则保留 1，否则保留 0。

genres = ["action", "adventure"....]
for col in genres:
    df[col] = np.where(df["0genero_"+col]==1 or df["1genero_"+col]==1, 1, 0]

删除不需要的其余列

【讨论】：