【问题标题】:featuretools: manual derivation of the features generated by dfs?featuretools:手动推导dfs生成的特征?
【发布时间】:2021-03-05 20:00:48
【问题描述】:

代码示例:

import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)

# Normalized one more time
es = es.normalize_entity(
    new_entity_id="device",
    base_entity_id="sessions",
    index="device", 
)
feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_entity="customers",
    agg_primitives=["std",],
    groupby_trans_primitives=['cum_count'],
    max_depth=2
)

我想更深入地研究STD(sessions.CUM_COUNT(device) by customer_id) 功能:

我尝试手动生成此功能但结果不同:

df = ft.demo.load_mock_customer(return_single_table=True)

a = df.groupby("customer_id")['device'].cumcount()
a.name = "cumcount_device"
a = pd.concat([df, a], axis=1)
b = a.groupby("customer_id")['cumcount_device'].std()

>>> b
customer_id
1   36.517
2   26.991
3   26.991
4   31.610
5   22.949
Name: cumcount_device, dtype: float64

我错过了什么?

【问题讨论】:

    标签: featuretools


    【解决方案1】:

    感谢您的提问。计算需要基于会话中的数据框。

    df = es['sessions'].df
    cumcount = df['device'].groupby(df['customer_id']).cumcount()
    std = cumcount.groupby(df['customer_id']).std()
    std.round(3).loc[feature_matrix.index]
    
    customer_id
    5    1.871
    4    2.449
    1    2.449
    3    1.871
    2    2.160
    dtype: float64
    

    您应该得到与 DFS 中相同的输出。

    【讨论】:

    • 现在我完全理解了!谢谢杰夫
    猜你喜欢
    • 2018-08-18
    • 2019-07-10
    • 2020-04-19
    • 2020-10-14
    • 2018-10-17
    • 2018-09-11
    • 2021-03-23
    • 2020-08-09
    • 2018-08-30
    相关资源
    最近更新 更多