【发布时间】:2019-08-12 18:05:16
【问题描述】:
我似乎找不到谷歌搜索这个问题的正确措辞,因为我得到了非常相似但不正确的答案。
我正忙于处理泰坦尼克号数据集,并想总结一个家庭中幸存成员的数量。所以数据集看起来像这样:
+-------------+----------+-----------+-------------+
| PassengerId | Survived | Surname | NumSurvived |
+-------------+----------+-----------+-------------+
| 1 | 0 | Braund | |
| 2 | 1 | Cumings | |
| 3 | 1 | Heikkinen | |
| 4 | 1 | Futrelle | |
| 5 | 0 | Braund | |
| 6 | 0 | Moran | |
| 7 | 0 | Futrelle | |
| 8 | 0 | Braund | |
| 9 | 1 | Cumings | |
+-------------+----------+-----------+-------------+
我需要对 NumSurvived 列中每个姓氏的 Survived 值求和,如下所示:
+-------------+----------+-----------+-------------+
| PassengerId | Survived | Surname | NumSurvived |
+-------------+----------+-----------+-------------+
| 1 | 0 | Braund | 0 |
| 2 | 1 | Cumings | 2 |
| 3 | 1 | Heikkinen | 1 |
| 4 | 1 | Futrelle | 1 |
| 5 | 0 | Braund | 0 |
| 6 | 0 | Moran | 0 |
| 7 | 0 | Futrelle | 1 |
| 8 | 0 | Braund | 0 |
| 9 | 1 | Cumings | 2 |
+-------------+----------+-----------+-------------+
【问题讨论】:
-
@ZackJoubert 没问题
-
是的,现在才意识到。如何提取求和的“生存”值?