【问题标题】:pandas dropna dropping the whole dataframe, need only to drop empty rowspandas dropna 删除整个数据框,只需要删除空行
【发布时间】:2022-11-20 07:24:43
【问题描述】:

我正在使用这段代码:

import pandas as pd
df = pd.read_excel('input.xls', sheet_name='Nouveau concept')
print(f"Dataframe:\n{df}")
new_df = df.dropna()
print(f"Dataframe now:\n{new_df}")

读取 Excel 文件(必须是 xls 而不是 xlsx)并删除所有空行,即根本不包含任何数据的行。

当我运行上面的命令时,我得到这个:

Anibals-New-MacBook-Air:UCNI anibal$ python3 test.py
Dataframe:
     Source Terminology Version  Requestor Internal ID    Parent ID                    Parent FSN  ... Unnamed: 77 Unnamed: 78 Unnamed: 79  Unnamed: 80
0                september 2022                    NaN  283403005.0  Cut of ear region (disorder)  ...         NaN         NaN         NaN          NaN
1                september 2022                    NaN  283403005.0  Cut of ear region (disorder)  ...         NaN         NaN         NaN          NaN
2                september 2022                    NaN  283412007.0   Cut of upper arm (disorder)  ...         NaN         NaN         NaN          NaN
3                september 2022                    NaN  283412007.0   Cut of upper arm (disorder)  ...         NaN         NaN         NaN          NaN
4                september 2022                    NaN  283413002.0       Cut of elbow (disorder)  ...         NaN         NaN         NaN          NaN
...                         ...                    ...          ...                           ...  ...         ...         ...         ...          ...
5056                        NaN                    NaN          NaN                           NaN  ...         NaN         NaN         NaN          NaN
5057                        NaN                    NaN          NaN                           NaN  ...         NaN         NaN         NaN          NaN
5058                        NaN                    NaN          NaN                           NaN  ...         NaN         NaN         NaN          NaN
5059                        NaN                    NaN          NaN                           NaN  ...         NaN         NaN         NaN          NaN
5060                        NaN                    NaN          NaN                           NaN  ...         NaN         NaN         NaN          NaN

[5061 rows x 81 columns]
Dataframe now:
Empty DataFrame
Columns: [Source Terminology Version, Requestor Internal ID, Parent ID, Parent FSN, FSN (*), Semantic Tag (*), PT (*), Synonym (1), Synonym (2), Definition, Reason for Change, Notes, References, Unnamed: 13, Unnamed: 14, Unnamed: 15, Unnamed: 16, Unnamed: 17, Unnamed: 18, Unnamed: 19, Unnamed: 20, Unnamed: 21, Unnamed: 22, Unnamed: 23, Unnamed: 24, Unnamed: 25, Unnamed: 26, Unnamed: 27, Unnamed: 28, Unnamed: 29, Unnamed: 30, Unnamed: 31, Unnamed: 32, Unnamed: 33, Unnamed: 34, Unnamed: 35, Unnamed: 36, Unnamed: 37, Unnamed: 38, Unnamed: 39, Unnamed: 40, Unnamed: 41, Unnamed: 42, Unnamed: 43, Unnamed: 44, Unnamed: 45, Unnamed: 46, Unnamed: 47, Unnamed: 48, Unnamed: 49, Unnamed: 50, Unnamed: 51, Unnamed: 52, Unnamed: 53, Unnamed: 54, Unnamed: 55, Unnamed: 56, Unnamed: 57, Unnamed: 58, Unnamed: 59, Unnamed: 60, Unnamed: 61, Unnamed: 62, Unnamed: 63, Unnamed: 64, Unnamed: 65, Unnamed: 66, Unnamed: 67, Unnamed: 68, Unnamed: 69, Unnamed: 70, Unnamed: 71, Unnamed: 72, Unnamed: 73, Unnamed: 74, Unnamed: 75, Unnamed: 76, Unnamed: 77, Unnamed: 78, Unnamed: 79, Unnamed: 80]
Index: []

因此,第二个数据框完全是空的。为什么?

我只需要读取包含任何数据的行,即,如果一行只是空的,则跳过它。

输入文件 input.xls 可以在这里找到:

https://docs.google.com/spreadsheets/d/1pXfhPHklnd0v45yLbff5E5dp2ypVIbxG/edit?usp=share_link&ouid=117900420544251849196&rtpof=true&sd=true

有任何想法吗?

顺便说一句,我无法清理文件。这个输入文件是由另一个系统生成的,我的作品应该自动处理这个文件,所以我不能只在 Excel 中加载它并清理它。

我尝试了一大堆 dropna 组合都无济于事。我还尝试了在 stackoverflow 中找到的其他几种解决方案,但都无济于事。

【问题讨论】:

  • df.dropna 的默认值为 how='any',它会删除至少包含一个 NA 值的轴(行或列)。你想要how='all'

标签: python pandas dataframe


【解决方案1】:

首先,仅导入所需的列(即使用 use_cols 排除空白列)

df = pd.read_excel('input.xls', sheet_name='Nouveau concept',usecols="A:M")

然后,要删除空行,您必须考虑列的子集。目前,有几列完全是空的,因此这就是删除所有行的原因。为了解决这个问题,请使用以下内容:

new_df =df.dropna(subset=['Source Terminology Version'], how = 'all')
# In this example, I used only one column, but you can pass in a list.

【讨论】:

    猜你喜欢
    • 2016-03-21
    • 1970-01-01
    • 1970-01-01
    • 2018-05-22
    • 2016-02-12
    • 1970-01-01
    • 1970-01-01
    • 2017-01-13
    • 2020-05-09
    相关资源
    最近更新 更多