【发布时间】:2016-03-14 05:01:28
【问题描述】:
我有一组句子,我想对它们进行分组,这样组中的所有行都应该共享一个特定的单词。然而,一个句子可以属于许多组,因为它有很多单词。
所以在下面的例子中,应该有一个这样的组:
- 包含所有行(0、1、2、3 和 4)的“温度”组
- 包含第 2 行和第 4 行的“冻结”组
- 包含第 0、1、2 和 3 行的“the”组
- 仅包含第 0 行的“金属”组。
- 数据集中每个其他单词的组
import pandas as pd
# An example data set
df = pd.DataFrame({"sentences": [
"two long pieces of metal fixed together, each of which bends a different amount when they are both heated to the same temperature",
"the temperature at which a liquid boils",
"a system for measuring temperature that is part of the metric system, in which water freezes at 0 degrees and boils at 100 degrees",
"a unit for measuring temperature. Measurements are often expressed as a number followed by the symbol °",
"a system for measuring temperature in which water freezes at 32º and boils at 212º"
]})
# Create a new series which is a list of words in each "sentences" column
df['words'] = df['sentences'].apply(lambda sentence: sentence.split(" "))
# Try to group by this new column
df.groupby('words').count()
# TypeError: unhashable type: 'list'
但是我的代码抛出了一个错误,如图所示。(见下文)
由于我的任务有点复杂,我知道它可能不仅仅涉及调用 groupby()。有人可以帮我用熊猫做词组吗?
edit 在通过返回tuple(sentence.split())(感谢ethan-furman)解决了错误后,我尝试打印结果,但它似乎没有做任何事情。我认为它可能只是将每一行放在一个组中:
print(df.groupby('words').count())
# sentences 5
# dtype: int64
【问题讨论】:
标签: python python-3.x pandas group-by