【发布时间】:2025-12-13 07:55:02
【问题描述】:
我有一个如下所示的数据框:
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| | Date | Professional | Description |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 0 | 2019-12-19 00:00:00 | Katie Cool | Travel to Space ... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 1 | 2019-12-20 00:00:00 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 2 | 2019-12-27 00:00:00 | Jenn Blossoms | Review lots of stuff/o... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 3 | 2019-12-27 00:00:00 | Jenn Blossoms | Draft email to world leader... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 4 | 2019-12-30 00:00:00 | Jenn Blossoms | Review this thing. |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 5 | 12-30-2019 Jenn Blossoms Telephone Call to A. Bell return her multiple | NaN | NaN |
| | voicemails. | | |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
我希望它看起来像这样:
+---+---------------------+---------------+-------------------------------------------------------------+
| | Date | Professional | Description |
+---+---------------------+---------------+-------------------------------------------------------------+
| 0 | 2019-12-19 00:00:00 | Katie Cool | Travel to Space ... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 1 | 2019-12-20 00:00:00 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 2 | 2019-12-27 00:00:00 | Jenn Blossoms | Review lots of stuff/o... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 3 | 2019-12-27 00:00:00 | Jenn Blossoms | Draft email to world leader... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 4 | 2019-12-30 00:00:00 | Jenn Blossoms | Review this thing. |
+---+---------------------+---------------+-------------------------------------------------------------+
| 5 | 12-30-2019 | Jenn Blossoms | Telephone Call to A. Bell return her multiple |
| | | | voicemails. |
+---+---------------------+---------------+-------------------------------------------------------------+
@Datanovice 在我的问题不太具体且需要修改时提供了一个很好的答案。
我已经编辑了我的问题并尝试编辑他的代码:
s = pd.to_datetime(dftopdata['Date'],errors='coerce').isna()
# gives us the error rows to filter.
# split out our datetime column so we can extract the values.
date_err = (
dftopdata[s]["Date"]
.str.extract("\d{2}-\d{2}-\d{4}\s+(\w+.*)")[0]
.str.split("\s", expand=True)
)
# set your values with `.loc`
dftopdata.loc[s,'Professional'] = date_err[0] + date_err[1]
dftopdata.loc[s,'Description'] = date_err[2]
但是当我运行上面的代码时,我得到一个看起来像这样的数据框:
+---+---------------------+---------------+--------------------------------------------+
| | Date | Professional | Description |
+---+---------------------+---------------+--------------------------------------------+
| 0 | 2019-12-19 00:00:00 | Katie Cool | Travel to Space ... |
+---+---------------------+---------------+--------------------------------------------+
| 1 | 2019-12-20 00:00:00 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+---------------------+---------------+--------------------------------------------+
| 2 | 2019-12-27 00:00:00 | Jenn Blossoms | Review lots of stuff/o... |
+---+---------------------+---------------+--------------------------------------------+
| 3 | 2019-12-27 00:00:00 | Jenn Blossoms | Draft email to world leader... |
+---+---------------------+---------------+--------------------------------------------+
| 4 | 2019-12-30 00:00:00 | Jenn Blossoms | Review this thing. |
+---+---------------------+---------------+--------------------------------------------+
| 5 | 12-30-2019 | JennBlossoms | |
+---+---------------------+---------------+--------------------------------------------+
我也收到此错误:试图在 DataFrame 中的切片副本上设置值。 尝试改用 .loc[row_indexer,col_indexer] = value
【问题讨论】:
-
这似乎是解决问题的好计划。到目前为止,您可以发布您的尝试代码吗?
-
错误是否一致?日期列格式错误是否意味着您在
col3中有错误? -
@Datanovice 是的,完全正确。
-
@JuanEstevez 我实际上仍然挂断第 1 步。
-
您真的应该为此提出一个新问题,因为它根本不相似。为您编辑。
标签: python pandas dataframe data-cleaning