【发布时间】:2022-01-05 12:15:00
【问题描述】:
我有 2 个列表:
- customer_ids
- 建议(列表列表,每个列表有 6000 个 shop_id)
recommendations 中的每个列表都代表customer_ids 中的客户推荐的商店。
我必须仅根据客户所在城市的商店过滤掉 20 个 shop_id。
期望的输出:
- recommendations-(列表列表,每个列表有 20 个 shop_id)
customer_ids = ['1','2','3',...]
recommendations = [['110','589','865'...], ['422','378','224'...],['198','974','546'...]]
过滤器:商店所在城市 == 客户所在城市。
要为客户和商店提取城市,我有 2 个 sql 查询:
df_cust_city = pd.read_sql_query("SELECT id, city_id FROM customer_table")
df_shop_city = pd.read_sql_query("SELECT shop_id, city FROM shop_table")
使用列表的代码
filtered_list = []
for cust_id, shop_id in zip(customer_ids, recommendations):
cust_city = df_cust_city.loc[df_cust_city['id'] == cust_id, 'city_id'].iloc[0] #get customer city
df_city_filter = (df_shop_city.where(df_shop_city['city'] == cust_city)).dropna() #get all shops in customer city
df_city_filter = df_city_filter.astype(int)
filter_shop = df_city_filter['shop_id'].astype(str).values.tolist() #make a list of shop_ids in customer city
filtered = [x for x in shop_id if x in filter_rest] #filter recommended shop_ids based on city-filtered list
shop_filtered = list(islice(filtered, 20))
filtered_list.append(shop_filtered) #create recommendation list of lists with only 20 filtered shop_ids
使用熊猫的代码
filtered_list = []
for cust_id, shop_id in zip(customer_ids, recommendations):
cust_city = df_cust_city.loc[df_cust_city['id'] == cust_id, 'city_id'].iloc[0] #get customer city
df_city_filter = (df_shop_city.where(df_shop_city['city'] == cust_city)).dropna()
recommended_shop = pd.DataFrame(shop_id, columns=['id'])
recommended_shop['id'] = recommended_shop['id'].astype(int)
shop_city_filter = pd.DataFrame(df_city_filter['id'].astype(int))
shops_common = recommended_shop.merge(shop_id, how='inner', on='id')
shops_common.drop_duplicates(subset="id", keep=False, inplace=True)
filtered = shops_common.head(20)
shop_filtered = filtered['id'].values.tolist()
filtered_list.append(shop_filtered)
完成 for 循环运行所需的时间:
使用列表:~8000 秒
使用熊猫:~3000 秒
我必须运行 for 循环 22 次。
有没有办法完全摆脱 for 循环?关于如何实现这一点的任何提示/指针,以便同时为 50000 名客户花费更少的时间。我正在用字典试一试。
df_cust_city:
id city_id
00919245 1
02220205 2
02221669 2
02223750 2
02304202 2
df_shop_city:
shop_id city
28 1
29 1
30 1
31 1
32 1
【问题讨论】:
标签: python pandas list dictionary for-loop