在数据框中拆分/提取部分列 - python答案

【问题标题】：Splitting/ Extracting part of Column in a Dataframe - python在数据框中拆分/提取部分列 - python
【发布时间】：2023-03-30 18:42:06
【问题描述】：

我正在尝试拆分/提取“时间”列的一部分，因此它只会显示小时和分钟，例如18:15 而不是 18:15:34。

我在网上看到了很多使用 .str.split() 函数并突出显示冒号作为分隔符的示例。但这会将时间列分成三列：小时、分钟、秒。

输入数据框：

df =

Index   Time
0       18:15:21
1       19:15:21
2       20:15:21
3       21:15:21
4       22:15:21

输出数据帧

df =

Index   Time
0       18:15
1       19:15
2       20:15
3       21:15
4       22:15

谢谢:)

【问题讨论】：

标签： python dataframe split extract

【解决方案1】：

您可以使用正则表达式：

df.Time.str.replace(':\d\d$', '')

或反向拆分：

df.Time.str.rsplit(':', 1).str[0]

【讨论】：

【解决方案2】：

你可以使用：

df['Time'].apply(lambda x : ':'.join(x.split(':')[0:2]))

【讨论】：

【解决方案3】：

replace、extract 或 split 与 pandas.series.str 在这里你有公平的选择

首先，这只是基于案例的解决方案..

以下解决方案确实替换了最后两个数字以及 : 跨 Time 列。

>>> df['Time'] = df['Time'].str.replace(':\d{2}$', '')
>>> df
    Time
0  18:15
1  19:15
2  20:15
3  21:15
4  22:15

第二种方法str.extract 和正则表达式..

>>> df['Time'] = df['Time'].str.extract('(\d{2}:\d{2})')
>>> df
    Time
0  18:15
1  19:15
2  20:15
3  21:15
4  22:15

\d{2} to hold initial two numbers

: next to match this immediately after first match

\d{2} again next two number followed by colon

$ asserts position at the end of a line

【讨论】：

它返回的是 NaN。我想知道该数据的类型，即从时间戳中拆分出来的时间，是否需要转换为字符串才能替换/提取它。