仅当它们是字符串时才将列值转换为小写答案

【问题标题】：Convert column values to lower case only if they are string仅当它们是字符串时才将列值转换为小写
【发布时间】：2025-11-24 12:10:01
【问题描述】：

我在将列转换为小写时遇到了问题。可不只是使用那么简单：

df['my_col'] = df['my_col'].str.lower()

因为我正在迭代很多数据帧，其中一些（但不是全部）在感兴趣的列中同时包含字符串和整数。如果像上面那样应用，这会导致较低的函数抛出异常：

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

我不想强制类型为字符串，而是想评估值是否为字符串，然后 - 如果是 - 将它们转换为小写，并且 - 如果它们不是字符串 - 保持原样。我认为这会起作用：

df = df.apply(lambda x: x.lower() if(isinstance(x, str)) else x)

但它不起作用......可能是因为我忽略了一些明显的东西，但我看不到它是什么！

我的数据如下所示：

                          OS    Count
0          Microsoft Windows     3
1                   Mac OS X     4
2                      Linux     234
3    Don't have a preference     0
4  I prefer Windows and Unix     3
5                       Unix     2
6                        VMS     1
7         DOS or ZX Spectrum     2

【问题讨论】：

你可以通过 df['my_col'] = df['my_col'].astype(str).str.lower() 强制所有内容到 str，是否有理由混合 dtype，因为这是非执行的
啊...我不确定。 astype(str) 在返回时不会将“Count”列中的所有整数都转换为字符串吗？如果是这样，那不会阻止以后的算术运算吗？我应该将其添加到原始问题中...

标签： python string pandas dataframe

【解决方案1】：

您的 lambda 函数中的测试不太正确，但您离事实不远：

df.apply(lambda x: x.str.lower() if(x.dtype == 'object') else x)

带有数据框和输出：

>>> df = pd.DataFrame(
    [
        {'OS': 'Microsoft Windows', 'Count': 3},
        {'OS': 'Mac OS X', 'Count': 4},
        {'OS': 'Linux', 'Count': 234},
        {'OS': 'Dont have a preference', 'Count': 0},
        {'OS': 'I prefer Windows and Unix', 'Count': 3},
        {'OS': 'Unix', 'Count': 2},
        {'OS': 'VMS', 'Count': 1},
        {'OS': 'DOS or ZX Spectrum', 'Count': 2},
    ]
)
>>> df = df.apply(lambda x: x.str.lower() if x.dtype=='object' else x)
>>> print(df)
                          OS  Count
0          microsoft windows      3
1                   mac os x      4
2                      linux    234
3     dont have a preference      0
4  i prefer windows and unix      3
5                       unix      2
6                        vms      1
7         dos or zx spectrum      2

【讨论】：

谢谢。另一个答案也很好，但是这个答案在应用小写字母之前测试了对象，这正是我想要的。

【解决方案2】：

这些列的类型是什么？ object?如果是这样，您应该转换它们：

df['my_col'] = df.my_col.astype(str).str.lower()

MVCE：

In [1120]: df
Out[1120]: 
   Col1
0   VIM
1   Foo
2  test
3     1
4     2
5     3
6   4.5
7   OSX

In [1121]: df.astype(str).Col1.str.lower()
Out[1121]: 
0     vim
1     foo
2    test
3       1
4       2
5       3
6     4.5
7     osx
Name: Col1, dtype: object

In [1118]: df.astype(str).Col1.str.lower().dtype
Out[1118]: dtype('O')

如果您想对这些行进行算术运算，您可能不应该混合使用 strs 和数字类型。

但是，如果确实是您的情况，您可以使用 pd.to_numeric(..., errors='coerce') 将类型转换为数字：

In [1123]: pd.to_numeric(df.Col1, errors='coerce')
Out[1123]: 
0    NaN
1    NaN
2    NaN
3    1.0
4    2.0
5    3.0
6    4.5
7    NaN
Name: Col1, dtype: float64

您可以使用 NaN，但现在请注意 dtype。

【讨论】：

谢谢！我同意不应该混合这些类型。问题是这是用户输入的数据，很难阻止人们这样做！下一次，我需要更好地限制用户输入。
@user4896331 在那之前，pd.to_numeric 是你的朋友。如果您之前没有看到编辑，请查看我的答案。

【解决方案3】：

从以上两个答案我认为这样做更安全一点：

注意astype(str)

df_lower=df.apply(lambda x: x.astype(str).str.lower() if(x.dtype == 'object') else x)

因为如果您的字符串列偶然仅包含某些行中的数字，则不执行 astype(str) 会将它们转换为 nan。这可能会慢一些，但它不会将只有数字的行转换为 nan。

【讨论】：

【解决方案4】：

这也有效且可读性强：

for column in df.select_dtypes("object").columns:
    df[column] = df[column].str.lower()

一个可能的缺点可能是 for 循环在列子集上。

【讨论】：