【问题标题】:Remove empty dataframe from pandas.core.groupby.generic.DataFrameGroupBy从 pandas.core.groupby.generic.DataFrameGroupBy 中删除空数据框
【发布时间】:2026-02-19 19:50:02
【问题描述】:

如何从 pandas.core.groupby.generic.DataFrameGroupBy 中删除空数据框?

我的聚合代码:

cols = ["col1", "col2","col3","col4"]  
joined = pd.concat(df.reset_index() for df in collectData)
joined = joined.replace({np.nan:1, 0:1})
joined[cols] = joined[cols].mask(joined[cols] < 0, 1)

df = joined.set_index('sensor').groupby(pd.Grouper(freq='D'))

分组后的数据:

list(df)

[(Timestamp('2020-02-04 00:00:00+0000', tz='UTC', freq='D'),
                                 col1       col2      col3    col4  
  sensor                                                                   
  2020-02-04 00:00:00+00:00    2.586569   0.015321  0.000149    0.884470   
  2020-02-04 00:00:00+00:00    4.429571   4.049798  1.820845    2.882445   
  2020-02-04 00:00:00+00:00   12.883314   6.900607  1.002138    3.613021    
  ...                               ...        ...       ...         ...    
  2020-02-04 23:45:00+00:00    3.798017   1.605979  0.176515    2.400820   
  2020-02-04 23:45:00+00:00    5.546771   2.232437  0.233292    3.750547   
  2020-02-04 23:45:00+00:00    4.910360   3.730932  0.985459    1.238469       
  
  [48945 rows x 4 columns]),
 (Timestamp('2020-02-05 00:00:00+0000', tz='UTC', freq='D'),
  Empty DataFrame
  Columns: [col1, col2, col3, col4]
  Index: []),
 (Timestamp('2020-02-06 00:00:00+0000', tz='UTC', freq='D'),
  Empty DataFrame
  Columns: [col1, col2, col3, col4]]
  Index: []),
 (Timestamp('2020-02-07 00:00:00+0000', tz='UTC', freq='D'),
                                 col1       col2      col3    col4  
  sensor                                                                   
  2020-02-07 00:00:00+00:00   17.065174   3.065422  0.171053    9.048574   
  2020-02-07 00:00:00+00:00   30.181997  20.651204  4.413567   15.200674   
  2020-02-07 00:00:00+00:00    1.864378   1.726365  0.819459    1.441588   
  ...                               ...        ...       ...         ...   
  2020-02-07 23:45:00+00:00   39.644320   0.234830  0.002289   13.642480   
  2020-02-07 23:45:00+00:00   30.778517  10.540318  0.944788   13.165241   
  2020-02-07 23:45:00+00:00   34.610439  25.342142  6.184292   22.725937      
  
  [50112 rows x 4 columns]),]

df的大小df.size():

sensor
2020-02-02 00:00:00+00:00    47574
2020-02-03 00:00:00+00:00    49353
2020-02-04 00:00:00+00:00    48945
2020-02-05 00:00:00+00:00        0
2020-02-06 00:00:00+00:00        0
                             ...  
2020-09-26 00:00:00+00:00    83680
2020-09-27 00:00:00+00:00    84293
2020-09-28 00:00:00+00:00    84873
2020-09-29 00:00:00+00:00    84306
2020-09-30 00:00:00+00:00    84875
Freq: D, Length: 242, dtype: int64

在应用std = df.apply(gstd) 之前,我需要删除空数据框。我不知道空数据框的位置。 https://*.com/a/51052536/14338086https://*.com/a/16916611/14338086 返回错误。同样使用df.filter(lambda x: x.size() != 0) 会返回TypeError: 'numpy.int64' object is not callabledropna() 不可用。

【问题讨论】:

  • 尝试将所有的 0 替换为 nan....然后尝试 dropna()
  • 'AttributeError: 'DataFrameGroupBy' 对象没有属性'replace' '
  • 如果 x 是空的,那么它会给你x.size() 上的错误

标签: python pandas dataframe pandas-groupby


【解决方案1】:

我通过以下代码解决了这个问题,也许它可以帮助某人。

cols = [" col1", "col2", "col3", "col4"]
   
joined = pd.concat(df.reset_index() for df in collectData)
joined = joined.replace({np.nan:1, 0:1})
joined[cols] = joined[cols].mask(joined[cols] < 0, 1)

df = joined.set_index('sensor').groupby(pd.Grouper(freq='D'))
dff = pd.concat(map(lambda x: x[1], df))
means = dff.groupby(dff.index.floor('d')).agg(gmean)
std = dff.groupby(dff.index.floor('d')).agg(gstd)

df_result = pd.merge (left=means, right=std, how='left', on='sensor')

【讨论】: