【发布时间】:2022-01-11 02:00:22
【问题描述】:
JSON 文件中的输入数据、交易历史记录:
{"customer_id": "C1", "basket": [{"product_id": "P3", "price": 506}, {"product_id": "P4", "price": 121}], "date_of_purchase": "2018-09-01 11:09:00"}
{"customer_id": "C27", "basket": [{"product_id": "P57", "price": 154}, {"product_id": "P42", "price": 349}, {"product_id": "P47", "price": 180}], "date_of_purchase": "2021-09-06 04:52:08.505909"}
{"customer_id": "C1", "basket": [{"product_id": "P3", "price": 506}, {"product_id": "P4", "price": 121}], "date_of_purchase": "2018-10-01 11:09:00"}
数据框:
customer_id basket date_of_purchase
0 C4 [{'product_id': 'P31', 'price': 26}] 2021-09-06 05:47:08.505909
1 C13 [{'product_id': 'P36', 'price': 566}] 2021-09-06 03:52:08.505909
2 C15 [{'product_id': 'P02', 'price': 839}] 2021-09-06 05:48:08.505909
3 C22 [{'product_id': 'P37', 'price': 1235}] 2021-09-05 20:52:08.505909
4 C27 [{'product_id': 'P57', 'price': 154}, {'produc... 2021-09-06 04:52:08.505909
我将 JSON 读入数据框的代码:
def read_json_folder(json_folder: str):
transactions_files = glob.glob("{}*/*.json".format(json_folder))
return pandas.concat(pandas.read_json(tf, lines=True) for tf in transactions_files)
对于每笔交易,我都需要客户 ID 以及他们购买特定产品的次数。
预期输出:
customer_id product_id purchase_count
C1 P2 11
C1 P3 5
C2 P9 7
【问题讨论】:
-
你的数据框中已经有 JSON 了吗?
-
@user17242583 是的,它已经在数据框中了。
-
你是怎么弄进去的?像这样?
pd.json_normalize(j, record_path='basket', meta='customer_id')(j是 json 对象的列表)
标签: python pandas dataframe data-science