【发布时间】:2020-03-21 01:42:34
【问题描述】:
我有一个像这样的数据框:
import pandas as pd
df = pd.DataFrame(columns = ['id', 'tag'])
df['id'] = (['1925782942580621034', '1925782942580621034',
'1925782942580621034', '1925782942580621034',
'1930659617975470678', '1930659617975470678',
'1930659617975470678', '1930659617975470678',
'1930659617975470678', '1930659617975470678',
'1930659617975470678', '1930659617975470678',
'1971229370376634911', '1971229370376634911',
'1971229370376634911', '1971229370376634911',
'1971229370376634911', '1971229370376634911',
'1971229370376634911', '1971229370376634911',
'1971229370376634911'])
df['tag'] = (['nintendo', 'cosmetic', 'pen', 'office supplies', 'holding',
'person', 'hand', 'text', 'design', 'pen', 'office supplies',
'cosmetic', 'tool', 'office supplies', 'weapon', 'indoor',
'everyday carry', 'pen', 'knife', 'electronics', 'case'])
df
我想努力获得类似的东西:
df_wish = pd.DataFrame(columns = ['id_source', 'id_target', 'common_tags'])
地点:
df_with['id_source'] #is the "id" that we are taking care of
df_with['id_target'] #is the "id" that has at least one "tag" in common with "id_source"
df_with['common_tags'] #is the number of shared "tag" between "id_source" and "id_target"
你能帮帮我吗?非常感谢
【问题讨论】:
-
你有多少个标签/ID?
-
在我的回答之后也见 cmets。
-
类似 15k 个唯一 ID 和大约 100k 个唯一标签。我有 32GB RAM 内存和 i7 cpu。谢谢
标签: python python-3.x pandas gephi