为每个客户制定一个“加权平均”变量答案

【问题标题】：Formulating a 'Weighted Average' variable for each client为每个客户制定一个“加权平均”变量
【发布时间】：2020-07-26 02:25:32
【问题描述】：

不确定标题是否包含我在这里尝试做的所有事情。

我正在对客户数据库进行分析，因此我们有 Dataframe 1，其中每一行代表一个唯一的客户（通过客户 ID）。

然后我有另一个数据框，其中列出了客户拥有的资产。但是，每一行代表一个唯一的资产（通过资产 ID）。因此，一个客户端的 id 可能会出现多次，这意味着如果不创建另一个变量就无法合并两个数据帧。

我想创建一些变量来代表客户在某种资产类型上的投资部分，以及他们的总资产。

有没有简单的方法来做到这一点？比如一个group_by clientid 然后分组资产类型和意思？

【问题讨论】：

我们需要数据来说明。你能用dput(head(df2, 30))的输出编辑问题吗？
如果您包含一个简单的reproducible example，其中包含可用于测试和验证可能解决方案的示例输入和所需输出，则会更容易为您提供帮助。

标签： r

【解决方案1】：

我重新创建了一个场景，试图模拟您所面临的问题，以尽我对您的情况的理解。希望它至少能让您走上您正在寻找的答案的道路上。

您可以将以下代码复制粘贴到您的 R 控制台中以完成所有步骤。

library(dplyr)

######## Create the client database, assuming 4 different asset classes and an asset value of 1 for each of them.
df <- cbind.data.frame(clientId = c(1,1,2,3,3,3,4,4,4,5,5,6,6,7,8,9,9,10,10,10),AssetCategory= rep(c('a','b','c','d'),5),AssetValue =rep(c(1),20))

#Calculating the clients' total assets
totalAssetByClient <- df %>% group_by(clientId) %>% summarize(totalAssetByClient = sum(AssetValue))

# Appending TotalAssetByClient variable to the dataframe (client database) <- Answer to your FIRST question
df2 <- left_join(df,totalAssetByClient,by = "clientId")


#  Then Create an empty dataset to host the AssetShareByClient table
AssetShareByClient <- data.frame(clienId = integer(), AssetCategory = character(), AssetShareByClient = double())

# Creating filling the AssetShareByClient table with a nested for Loop (sorry no easy way)
for (client in unique(df2$clientId))
{
for (asset in unique(df2$AssetCategory))
{    
    df3 <- filter(df2, clientId == client, AssetCategory == asset)
    AssetShareByClient <- rbind(AssetShareByClient, c(client,asset,sum(df3$AssetValue)/mean(df3$totalAssetByClient)))
}
}

# We now have a standalone table with a column showing the proportion of investment per asset for each cient <- Answer to your SECOND question
# When the client has 0% share of an asset category it shows NaN. The sum of asset share category  for each client sums to 100%
names(AssetShareByClient) = c("clientId","AssetCategory","AssetShareByClient")
print(AssetShareByClient)

【讨论】：

感谢伙计，这正是我想要的。一针见血的欢呼！！