【发布时间】:2019-03-15 08:44:05
【问题描述】:
我正在尝试添加一个新的计算字段。我正在尝试Adding calculated column(s) to a dataframe in pandas 中的第二个最佳答案,因为它在我看来似乎是最好的,因为它很整洁。请随时提供更好的选择。
无论哪种方式,我的初始代码如下:
import pandas as pd
#https://github.com/sivabalanb/Data-Analysis-with-Pandas-and-Python/blob/master/nba.csv
dt_nba = pd.read_csv("data//nba.csv")
#note this is just basic function. I want to pass partitioned data like team's average salary
def GetSalaryIncrement(val):
return val * 1.1
dt_nba["SalaryPlus10Percent"] = map(GetSalaryIncrement,dt_nba["Salary"])
dt_nba[["Name","Team","Salary","SalaryPlus10Percent"]][:5]
然而,结果不是我所期望的:
+----+---------------+----------------+--------------+--------------------------------+
| ID | Name | Team | Salary | SalaryPlus10Percent |
+----+---------------+----------------+--------------+--------------------------------+
| 0 | Avery Bradley | Boston Celtics | 7730337.0000 | <map object at 0x7fb819e9b7b8> |
| 1 | Jae Crowder | Boston Celtics | 6796117.0000 | <map object at 0x7fb819e9b7b8> |
| 2 | John Holland | Boston Celtics | nan | <map object at 0x7fb819e9b7b8> |
| 3 | R.J. Hunter | Boston Celtics | 1148640.0000 | <map object at 0x7fb819e9b7b8> |
| 4 | Jonas Jerebko | Boston Celtics | 5000000.0000 | <map object at 0x7fb819e9b7b8> |
+----+---------------+----------------+--------------+--------------------------------+
我对传递“窗口/聚合数据”特别感兴趣,它应该优雅地忽略 Nan 值。
T-SQL 中的示例我可以这样做:
-- INCREASE EACH PLAYERS SALARY BY 10% OF AVERAGE SALARY OF THE TEAM
SELECT NewSalary= Salary + (.1 * AVG(Salary) OVER (PARTITION BY Team))
FROM nba_data
如果可能的话,我想在 Pandas 中这样做。谢谢。
【问题讨论】: