【发布时间】:2018-03-30 09:41:04
【问题描述】:
我有一张这样预处理过的客户贷款发放和还款表
customerID | balanceChange | trxDate | TYPE
242105 | 500 | 20170605 | loan
242105 | 1500 | 20170605 | loan
242105 | -1000 | 20170607 | payment
242111 | 500 | 20170605 | loan
242111 | -500 | 20170606 | payment
242111 | 500 | 20170607 | loan
242111 | -500 | 20170609 | payment
242151 | 500 | 20170605 | loan
我想做的是(1)统计每天发放的每一笔贷款,有多少已经全额还清,(2)客户用了多少天付钱给他们。
还款的规则当然是FIFO(先进先出),所以最早的贷款先还。
在上面的例子中,解决方案是
trxDate | nRepayments | timeGap(days)
20170605 | 2 | 1.5
20170606 | 0 | 0
20170607 | 1 | 2
因此,解决方案为何如此的解释是在 20170605 上,发放了 4 笔贷款(2 笔给 customerID 242105,另外两笔给 242111 和 242151),但只有 2 笔贷款被偿还( 500 给 242105 和 500 给 242111)。 timeGap 是每个客户还款天数的平均值(242105 在 20170607 - 2 天还款,242111 在 20170606 - 1 天还款),所以 (2+1)/2 = 1.5.
我尝试使用以下 R 脚本计算 nRepayments(我认为如果我这样做了 timeGap 应该是小菜一碟)。
#Recoveries
data_loans_rec <- data_loans %>% arrange(customerID, trxDate) %>% as.data.table()
data_loans_rec[is.na(data_loans_rec)] <- 0
data_loans_rec <- data_loans_rec[, index := seq_len(.N), by = customerID][!(index == 1 & TYPE == "payment")][, index := seq_len(.N), by = customerID]
n_loans_given <- data_loans[TYPE == "loan", ][, .(nloans = .N), .(payment)][order(payment)]
n_loans_rec <- copy(n_loans_given)
n_loans_rec[, nloans:=0]
unique_cust <- unique(data_loans_rec$customerID)
#Check repayment for every customer================
for (i in 1:length(unique_cust)) {
cur_cust <- unique_cust[i]
list_loan <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "loan", .(balanceChange)] )
list_loan_time <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "loan", .(trxDate) ])
list_pay <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "payment", .(balanceChange) ])
if (dim(list_pay)[1] == 0) { #if there are no payments
list_pay <- c(0)
}
sum_paid <- sum(abs(list_pay))
i_paid_until <- 0
for (i_loantime in 1:(dim(list_loan_time)[1])) {
#if there is only one loan
if (i_loantime == 0) {
i_loantime <- 1
}
loan_curr <- list_loan[i_loantime]
loan_left <- loan_curr - sum_paid
if (loan_left <= 0) {
n_loans_rec[trxDate == list_loan_time[i_loantime], nloans:=nloans+1]
sum_paid <- sum_paid - loan_curr
print (paste(i_loantime, list_loan_time[i_loantime], n_loans_rec[trxDate == list_loan_time[i_loantime], .(nloans)]))
# break
} else {
break
}
}
print (i)
}
我们的想法是,为每位客户制作一份贷款清单、贷款时间和付款清单。 最佳情况是如果客户的贷款总额等于或小于(由于脏数据)支付总额(全额支付)。然后,还款次数等于向该客户发放的贷款数量。 一般情况是客户进行部分付款的情况。在这种情况下,我将支付的总金额相加,并迭代客户所做的每笔贷款,同时在我迭代时对贷款总额求和。如果贷款金额最终超过还款金额,那么我计算客户的付款实际涵盖了多少贷款。
问题是我有数百万客户,他们每个人都至少进行了 5 次贷款和付款。因此,由于我使用的是嵌套循环,需要数小时才能完成。
所以,我在这里问是否有人遇到过这个问题和/或有更好、更有效的解决方案。
提前致谢!
【问题讨论】:
-
不是答案,但这是您可以并行化的情况吗? cran.r-project.org/web/packages/doParallel/vignettes/…
标签: r optimization data.table