【发布时间】:2025-11-26 01:05:01
【问题描述】:
我有一个熊猫DataFrame 包含字典作为元素的单列。它是以下代码的结果:
dg # is a pandas dataframe with columns ID and VALUE. Many rows contain the same ID
def seriesFeatures(series):
"""This functions receives a series of VALUE for the same ID and extracts
tens of complex features from the series, storing them into a dictionary"""
dico = dict()
dico['feature1'] = calculateFeature1
dico['feature2'] = calculateFeature2
# Many more features
dico['feature50'] = calculateFeature50
return dico
grouped = dg.groupby(['ID'])
dh = grouped['VALUE'].agg( { 'all_features' : lambda s: seriesFeatures(s) } )
dh.reset_index()
# Here I get a dh DataFrame of a single column 'all_features' and
# dictionaries stored on its values. The keys are the feature's names
我需要以有效的方式将此'all_features' 列拆分为尽可能多的列(我有太多的行和列,我无法更改seriesFeatures 函数),所以输出将是具有列ID、FEATURE1、FEATURE2、FEATURE3、...、FEATURE50 的数据框。最好的方法是什么?
编辑
一个具体而简单的例子:
dg = pd.DataFrame( [ [1,10] , [1,15] , [1,13] , [2,14] , [2,16] ] , columns=['ID','VALUE'] )
def seriesFeatures(series):
dico = dict()
dico['feature1'] = len(series)
dico['feature2'] = series.sum()
return dico
grouped = dg.groupby(['ID'])
dh = grouped['VALUE'].agg( { 'all_features' : lambda s: seriesFeatures(s) } )
dh.reset_index()
但是当我尝试用 pd.Series 或 pd.DataFrame 包装它时,它说如果数据是标量值,则必须提供索引。提供 index=['feature1','feature2'],我得到奇怪的结果,例如使用:dh = grouped['VALUE'].agg( { 'all_features' : lambda s: pd.DataFrame( seriesFeatures(s) , index=['feature1','feature2'] ) } )
【问题讨论】:
-
感谢案例!更新了我的答案。