【问题标题】:Efficient loan repayment calculation高效的还款计算
【发布时间】:2018-03-30 09:41:04
【问题描述】:

我有一张这样预处理过的客户贷款发放和还款表

customerID | balanceChange | trxDate        | TYPE
242105     | 500           | 20170605       | loan
242105     | 1500          | 20170605       | loan
242105     | -1000         | 20170607       | payment
242111     | 500           | 20170605       | loan
242111     | -500          | 20170606       | payment
242111     | 500           | 20170607       | loan
242111     | -500          | 20170609       | payment
242151     | 500           | 20170605       | loan

我想做的是(1)统计每天发放的每一笔贷款,有多少已经全额还清,(2)客户用了多少天付钱给他们

还款的规则当然是FIFO(先进先出),所以最早的贷款先还。

在上面的例子中,解决方案是

trxDate      | nRepayments   | timeGap(days)
20170605     | 2             | 1.5
20170606     | 0             | 0
20170607     | 1             | 2

因此,解决方案为何如此的解释是在 20170605 上,发放了 4 笔贷款(2 笔给 customerID 242105,另外两笔给 242111 和 242151),但只有 2 笔贷款被偿还( 500 给 242105 和 500 给 242111)。 timeGap 是每个客户还款天数的平均值(242105 在 20170607 - 2 天还款,242111 在 20170606 - 1 天还款),所以 (2+1)/2 = 1.5.

我尝试使用以下 R 脚本计算 nRepayments(我认为如果我这样做了 timeGap 应该是小菜一碟)。

#Recoveries
data_loans_rec <- data_loans %>% arrange(customerID, trxDate) %>% as.data.table()
data_loans_rec[is.na(data_loans_rec)] <- 0
data_loans_rec <- data_loans_rec[, index := seq_len(.N), by = customerID][!(index == 1 & TYPE == "payment")][, index := seq_len(.N), by = customerID]
n_loans_given <- data_loans[TYPE == "loan", ][, .(nloans = .N), .(payment)][order(payment)]
n_loans_rec <- copy(n_loans_given)
n_loans_rec[, nloans:=0]


unique_cust <- unique(data_loans_rec$customerID)

#Check repayment for every customer================
for (i in 1:length(unique_cust)) {


  cur_cust <- unique_cust[i]
  list_loan <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "loan", .(balanceChange)]  )
  list_loan_time <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "loan", .(trxDate) ])
  list_pay <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "payment", .(balanceChange) ])

  if (dim(list_pay)[1] == 0) { #if there are no payments
    list_pay <- c(0)
  }

  sum_paid <- sum(abs(list_pay))
  i_paid_until <- 0

  for (i_loantime in 1:(dim(list_loan_time)[1])) {
    #if there is only one loan
    if (i_loantime == 0) {
      i_loantime <- 1
    }
    loan_curr <- list_loan[i_loantime]
    loan_left <- loan_curr - sum_paid
    if (loan_left <= 0) {

      n_loans_rec[trxDate == list_loan_time[i_loantime], nloans:=nloans+1]
      sum_paid <- sum_paid - loan_curr
      print (paste(i_loantime, list_loan_time[i_loantime], n_loans_rec[trxDate == list_loan_time[i_loantime], .(nloans)]))
      # break
    } else {
      break
    }



  }

  print (i)


}

我们的想法是,为每位客户制作一份贷款清单、贷款时间和付款清单。 最佳情况是如果客户的贷款总额等于或小于(由于脏数据)支付总额(全额支付)。然后,还款次数等于向该客户发放的贷款数量。 一般情况是客户进行部分付款的情况。在这种情况下,我将支付的总金额相加,并迭代客户所做的每笔贷款,同时在我迭代时对贷款总额求和。如果贷款金额最终超过还款金额,那么我计算客户的付款实际涵盖了多少贷款。

问题是我有数百万客户,他们每个人都至少进行了 5 次贷款和付款。因此,由于我使用的是嵌套循环,需要数小时才能完成

所以,我在这里问是否有人遇到过这个问题和/或有更好、更有效的解决方案。

提前致谢!

【问题讨论】:

标签: r optimization data.table


【解决方案1】:

您的逻辑非常复杂,我不会尝试完全复制这个答案;我的目的只是给你一些关于如何优化的想法。

此外,如 cmets 中所述,您可以尝试并行化,或者使用其他编程语言。

无论如何,由于您的设置已经使用data.table,您可以尝试尽可能多地使用全局操作,这通常会比您的大循环更快。例如像这样的东西。

我首先根据客户 ID 计算余额和已完成付款的总和:

data_loans_rec <- data_loans_rec[, balance := sum(balanceChange), by = customerID]
data_loans_rec <- data_loans_rec[, sumPayments := sum(balanceChange[TYPE == "payment"]), by = customerID]

这样,您已经知道余额为 0 的每个客户都已还清所有款项:

data_loans_rec <- data_loans_rec[TYPE == "loan" & balance == 0, repaid := TRUE, by = list(customerID, index)]

如果您有数百万客户,这些操作当然会读取大量数据,但我想说data.table 应该很快就能处理它们。

对于其余的客户,但仅限于那些贷款并且您还不知道他们是否已经还清的收银机,您可以使用data.table函数。

setRepaid <- function(balanceChange, sumPayments) {
  # note that here you get a vector for all the loans of a customer
  sumPay <- (-1) * sumPayments[1]
  if (sumPay == 0)
    return(rep(FALSE, length(balanceChange)))
  number_of_loans_paid <- 0
  for (i in 1:length(balanceChange)) {
    if (sum(balanceChange[1:i]) > sumPay)
      break
    number_of_loans_paid <- number_of_loans_paid + 1
  }
  return(c(rep(TRUE, number_of_loans_paid), rep(FALSE, length(balanceChange)-number_of_loans_paid)))
}
data_loans_rec <- data_loans_rec[TYPE == "loan" & is.na(repaid), repaid := setRepaid(balanceChange, sumPayments), by = list(customerID) ]

这样你就可以得到想要的结果,至少对于你的例子来说。

   customerID balanceChange  trxDate    TYPE index balance sumPayments repaid
1:     242105           500 20170605    loan     1    1000       -1000   TRUE
2:     242105          1500 20170605    loan     2    1000       -1000  FALSE
3:     242105         -1000 20170607 payment     3    1000       -1000     NA
4:     242111           500 20170605    loan     1       0       -1000   TRUE
5:     242111          -500 20170606 payment     2       0       -1000     NA
6:     242111           500 20170607    loan     3       0       -1000   TRUE
7:     242111          -500 20170609 payment     4       0       -1000     NA
8:     242151           500 20170605    loan     1     500           0  FALSE

优点是:最终循环适用于更少的客户,您已经预先计算了一些东西,并且您依赖data.table 来实际替换您的循环。希望这种方法能给您带来改进。我认为这是一个尝试。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-12-02
    • 2014-07-23
    • 2021-09-05
    • 2014-01-02
    • 2015-05-31
    • 2013-03-09
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多