根据条件重命名熊猫数据框的多列答案

【问题标题】：rename multiple columns of pandas dataframe based on condition根据条件重命名熊猫数据框的多列
【发布时间】：2019-09-15 15:38:28
【问题描述】：

我有一个 df，我需要将 40 个列名重命名为空字符串。这可以通过使用.rename() 来实现，但我需要提供dict 中的所有列名，需要重命名。我正在寻找一些更好的方法来通过一些模式匹配来重命名列。在列名中找到 NULL/UNNAMED 的任何位置，将其替换为空字符串。

df1：原始 df（在实际 df 中，我有大约 20 列作为 NULL1-NULL20 和 20 列作为 UNNAMED1-UNNAMED20）

    NULL1   NULL2   C1  C2  UNNAMED1    UNNAMED2
0   1   11  21  31  41  51
1   2   22  22  32  42  52
2   3   33  23  33  43  53
3   4   44  24  34  44  54

想要的输出df：

            C1  C2      
0   1   11  21  31  41  51
1   2   22  22  32  42  52
2   3   33  23  33  43  53
3   4   44  24  34  44  54

这可以通过

来实现

df.rename(columns={'NULL1':'', 'NULL2':'', 'UNNAMED1':'', 'UNNAMED2':''}, inplace=True)

但我不想创建包含 40 个元素的长字典

【问题讨论】：

标签： python python-3.x pandas

【解决方案1】：

如果你想坚持rename：

def renaming_fun(x):
    if "NULL" in x or "UNNAMED" in x:
        return "" # or None
    return x

df = df.rename(columns=renaming_fun)

如果映射函数变得更复杂，它会很方便。否则，列表推导可以：

df.columns = [renaming_fun(col) for col in cols]

另一种可能性：

df.columns = map(renaming_fun, df.columns)

但正如已经提到的，用空字符串重命名不是你通常会做的事情。

【讨论】：

【解决方案2】：

是否可能，但要小心 - 如果需要选择一个空列，则获取所有空列，因为重复的列名：

print (df[''])

0  1  11  41  51
1  2  22  42  52
2  3  33  43  53
3  4  44  44  54

使用startswith 在列表理解中按元组获取所有列：

df.columns = ['' if c.startswith(('NULL','UNNAMED')) else c for c in df.columns]

你的解决方案应该改变：

d = dict.fromkeys(df.columns[df.columns.str.startswith(('NULL','UNNAMED'))], '')
print (d)
{'NULL1': '', 'NULL2': '', 'UNNAMED1': '', 'UNNAMED2': ''}
df = df.rename(columns=d)

【讨论】：

是否可以将 Django 模型的 verbose_name 传递给数据框的列名？
@user12379095 - 从不使用 Django，但如果可能将其转换为与列数相同长度的列表，那么可以。 df.columns = L
我所做的是，从表（作为元组）中创建列名和 verbose_name 列表，如[('client_name', 'Client Name'), ('country', 'Country'), ('product', 'Product'), ('price', 'Price')]。现在我包含实际数据的数据框的列列表为['client_name', 'country', 'price', 'product']（即实际列名）。我试图做的是传递关联的verbose_name Client Name 代替列名 client_name 等等。这可能吗？
@user12379095 - 如果有 tups = [('client_name', 'Client Name'), ('country', 'Country'), ('product', 'Product'), ('price', 'Price')] df = pd.DataFrame(columns=['client_name', 'country', 'price', 'product']) 那么解决方案是将元组转换为字典并使用 rename 像 df = df.rename(columns=dict(tups))
感谢this，终于搞定了。实际上，您的第一个 hint 最终起作用了。感谢所有的帮助。并且可以承受。

【解决方案3】：

您可以在 df.rename() 中使用 dict 理解：

idx_filter = np.asarray([i for i, col in enumerate(df.columns) if SOME_STRING_CONDITION in col])
df.rename(columns={col: '' for col in df.columns[idx_filter]}, inplace=True)

在您的情况下，听起来 SOME_STRING_CONDITION 将是“NULL”或“UNNAMED”。

我在为我自己的问题寻找更通用的列重命名问题 (Renaming columns in pandas) 的线程上寻求帮助时发现了这一点。我没有足够的声誉将我的解决方案添加为答案或评论（我是 stackoverflow 上的新手），所以我将其发布在这里！

如果您需要保留要过滤的部分字符串，此解决方案也很有帮助。例如，如果您想将“C”列替换为“col_”：

idx_filter = np.asarray([i for i, col in enumerate(df.columns) if 'C' in col])
df.rename(columns={col: col.replace('C', 'col_') for col in df.columns[idx_filter]}, inplace=True)

【讨论】：

【解决方案4】：

如果您有几列保留其名称。如下使用list-comprehension：

df.columns = [col if col in ('C1','C2') else "" for col in df.columns]

【讨论】：

我还有 30 列要保留在我的案例中。我可以接受列表并将其传递给列表理解..
那么@Adrien 解决方案将为您解决问题

【解决方案5】：

df.columns = [col if “NULL” not in col else “” for col in df.columns]

这应该可行，因为您可以通过将列表分配给数据框列变量来更改列名。

【讨论】：