【问题标题】:apply function to be used with a user defined function与用户定义的函数一起使用的应用函数
【发布时间】:2020-06-15 17:18:18
【问题描述】:

我有以下矩阵:

2001 2002 2003 2004 2005 Start End
0.1  0.1 0.5  NA    1    2002 2004
0.2   NA   1   0.1  0.8  2001 2003

有没有一种方法可以将 apply (matrix,1,Fun) 与用户定义的函数一起使用,通过仅考虑 Start 和 End 之间的列返回以下结果来计算每行的最大值:

0.5
1

我想避免使用循环来更快。

输出

输入(矩阵)

    structure(c("0.874019503593445", "0.86563766002655", "0.876677572727203", 
"0.87995857000351", "0.857958316802979", "0.876143276691437", 
"0.862336575984955", "0.853970944881439", "0.846048414707184", 
"0.859661996364594", "0.834502220153809", "0.852427303791046", 
"0.822492599487305", "0.813440501689911", "0.76100218296051", 
"0.820943117141724", "0.764204323291779", "0.785735309123993", 
"0.817545235157013", "0.81278395652771", "0.723827838897705", 
"0.785534679889679", "0.7496058344841", "0.777993083000183", 
"0.817822515964508", "0.812875807285309", "0.712881326675415", 
"0.756305634975433", "0.743959069252014", "0.769496917724609", 
"0.810760021209717", "0.80568391084671", "0.693456709384918", 
"0.727128684520721", "0.729035794734955", "0.756283938884735", 
"0.807579278945923", "0.802182674407959", "0.684139847755432", 
"0.704573571681976", "0.724795341491699", "0.756382524967194", 
"0.804896593093872", "0.799209952354431", "0.678259372711182", 
"0.702164053916931", "0.718267142772675", "0.751578271389008", 
"0.799822628498077", "0.793913602828979", "0.665935218334198", 
"0.694310545921326", "0.708142817020416", "0.743632614612579", 
"0.793001770973206", "0.787713170051575", "0.647054612636566", 
"0.677092432975769", "0.694708645343781", "0.73238867521286", 
"0.789437413215637", "0.784072935581207", "0.63111686706543", 
"0.65886777639389", "0.684714734554291", "0.724371314048767", 
"0.843812942504883", "0.839520990848541", "0.773352861404419", 
"0.779657006263733", "0.788854360580444", "0.809786915779114", 
"0.877531945705414", "0.873724162578583", "0.842803955078125", 
"0.855762183666229", "0.850594639778137", "0.862756550312042", 
"0.880029559135437", "0.873950898647308", "0.870781242847443", 
"0.866772592067719", "0.858940064907074", "0.868877649307251", 
"0.853329718112946", "0.855516672134399", "0.823540806770325", 
"0.830541551113129", "0.830010652542114", "0.819457948207855", 
"0.827873051166534", "0.822244465351105", "0.773614823818207", 
"0.797049939632416", "0.751720309257507", "0.768605947494507", 
"0.766616880893707", "0.756787300109863", "0.566888153553009", 
"0.586601614952087", "0.581688940525055", "0.617175340652466", 
"0.718824028968811", "0.706149041652679", "0.378618478775024", 
"0.391031503677368", "0.446874260902405", "0.480405360460281", 
"0.795396089553833", "0.786651313304901", "0.678426802158356", 
"0.685614347457886", "0.696484625339508", "0.716927468776703", 
"0.810630321502686", "0.803157091140747", "0.759427130222321", 
"0.744760811328888", "0.755291044712067", "0.770642638206482", 
"0.820476651191711", "0.813506543636322", "0.785165846347809", 
"0.786000072956085", "0.795400500297546", "0.804410338401794", 
"0.845328569412231", "0.839278340339661", "0.807336449623108", 
"0.812321186065674", "0.834664046764374", "0.842991709709167", 
"0.860692620277405", "0.855358898639679", "0.866921603679657", 
"0.871038675308228", "0.857144951820374", "0.862971782684326", 
"0.8672935962677", "0.862846493721008", "0.8633052110672", "0.86958372592926", 
"0.854647994041443", "0.86020702123642", "0.865541696548462", 
"0.861103653907776", "0.865747451782227", "0.872776210308075", 
"0.854572534561157", "0.855659008026123", "0.866914212703705", 
"0.862517952919006", "0.872524857521057", "0.878805994987488", 
"0.85886561870575", "0.858219861984253", "0.878345549106598", 
"0.874269723892212", "0.888411402702332", "0.892201542854309", 
"0.874585866928101", "0.876360237598419", "0.884132504463196", 
"0.88018411397934", "0.89205276966095", "0.894862949848175", 
"0.879017233848572", "0.884612858295441", "0.884777724742889", 
"0.880885481834412", "0.896000146865845", "0.901106178760529", 
"0.880709052085876", "0.884517729282379", "0.88497519493103", 
"0.881140291690826", "0.889452517032623", "0.893048226833344", 
"0.875616371631622", "0.879878103733063", "0.879022836685181", 
"0.875194191932678", "0.866267263889313", "0.871312737464905", 
"0.859204828739166", "0.866826951503754", "0.868023276329041", 
"0.86393541097641", "0.841034770011902", "0.847780704498291", 
"0.837899088859558", "0.848678171634674", "0.862271964550018", 
"0.85805755853653", "0.829115211963654", "0.831472992897034", 
"0.824896633625031", "0.836804389953613", "0.860555946826935", 
"0.856406271457672", "0.82395201921463", "0.818446636199951", 
"0.817780017852783", "0.829550802707672", "0.871107757091522", 
"0.867229819297791", "0.838211238384247", "0.848833799362183", 
"0.838103652000427", "0.852475762367249", "0.835780441761017", 
"0.831098616123199", "0.74629545211792", "0.763430953025818", 
"0.768242657184601", "0.796047866344452", "2001.06.09", "2001.06.14", 
"2001.06.13", "2001.07.27", "2001.06.30", "2001.06.10", "2001.10.03", 
"2001.09.27", "2001.09.18", "2001.10.29", "2001.10.01", "2001.10.01"
), .Dim = c(6L, 38L), .Dimnames = list(NULL, c("SM_2001.01.01", 
"SM_2001.01.11", "SM_2001.01.21", "SM_2001.02.01", "SM_2001.02.11", 
"SM_2001.02.21", "SM_2001.03.01", "SM_2001.03.11", "SM_2001.03.21", 
"SM_2001.04.01", "SM_2001.04.11", "SM_2001.04.21", "SM_2001.05.01", 
"SM_2001.05.11", "SM_2001.05.21", "SM_2001.06.01", "SM_2001.06.11", 
"SM_2001.06.21", "SM_2001.07.01", "SM_2001.07.11", "SM_2001.07.21", 
"SM_2001.08.01", "SM_2001.08.11", "SM_2001.08.21", "SM_2001.09.01", 
"SM_2001.09.11", "SM_2001.09.21", "SM_2001.10.01", "SM_2001.10.11", 
"SM_2001.10.21", "SM_2001.11.01", "SM_2001.11.11", "SM_2001.11.21", 
"SM_2001.12.01", "SM_2001.12.11", "SM_2001.12.21", "SOS", "EOS"
)))

【问题讨论】:

  • 在该示例中,startend 是什么? SOSEOS?
  • 没错!赛季开始和赛季结束

标签: r


【解决方案1】:

首先,与使用*apply 相比,循环可以达到相同(或可能更快)的速度。不管怎么说,这些函数只是高效且安全地实现了循环。

至于我们如何实现它,无论如何我们都必须遍历每个元素,因此假设您的数据存储在d(具有上述结构)

d <- as.data.frame(d)
#change column classes (to date and numeric). Note lapply iterates over columns
# d was original a matrix, and because numeric and date columns were present, they are all converted to 'character' columns
nc <- ncol(d)
numCol <- seq_len(nc - 2)
d[, numCol] <- lapply(d[, numCol], as.numeric)
d[, -numCol] <- lapply(d[, -numCol], as.Date, format = '%Y.%m.%d')
# Extract the date value of each column name
colDates <- colnames(d)[numCol]
colDates <- as.Date(gsub('^SM_', '', colDates), format = '%Y.%m.%d')
# define a 'between' function (similar to the one present in the data.table package) 
# Not as general (or fast) though
`%between%` <- function(x, y){
  mi <- min(y)
  ma <- max(y)
  x >= mi & x <= ma
}
# Finaly find the max value between each date, for each row
f <- function(x){
  # Get period (unlist for safety)
  period <- unlist(c(x['SOS'], x['EOS']))
  #Find max using our new %between% function. 
  #Note: 
  # use na.rm = TRUE to remove possible na values.
  # apply also changes d to a matrix, and because of this numeric columns turn into characters (there's a date column). So i use as.numeric to change it back.
  max(as.numeric(x[colDates %between% period]), na.rm = TRUE)
}
(d$max <- apply(d, 1, f))
0.8841325 0.8742697 0.8725249 0.9011062 0.8790172 0.8846129

请注意,这里有 3 个不同的循环。前 2 次申请 2 次,最后一次申请 1 次。

【讨论】:

    【解决方案2】:

    提示:您应该考虑将数据存储在结构整洁的 data.frame 中。

    使用整个tidyverselubridate这种丑陋的方式

    matrix %>%
      as.data.frame() %>%
      mutate(group=row_number()) %>%
      pivot_longer(cols = starts_with("SM_"), names_to="date", names_prefix="SM_") %>%
      mutate(value = as.numeric(as.character(value)),
             across(c(SOS, EOS, date), ymd), 
             ind = date %within% interval(SOS, EOS)) %>%
      group_by(group, ind) %>%
      summarise(maximum = as.factor(max(value)), .groups="drop_last") %>%
      filter(ind | is.na(ind)) %>%
      pull(maximum) %>%
      cbind(as.data.frame(matrix),maximum=.) %>%
      as.matrix()
    

    给出(仅最后四列):

         SM_2001.12.21       SOS          EOS          maximum            
    [1,] "0.835780441761017" "2001.06.09" "2001.10.03" "0.884132504463196"
    [2,] "0.831098616123199" "2001.06.14" "2001.09.27" "0.874269723892212"
    [3,] "0.74629545211792"  "2001.06.13" "2001.09.18" "0.872524857521057"
    [4,] "0.763430953025818" "2001.07.27" "2001.10.29" "0.901106178760529"
    [5,] "0.768242657184601" "2001.06.30" "2001.10.01" "0.879017233848572"
    [6,] "0.796047866344452" "2001.06.10" "2001.10.01" "0.884612858295441"
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-07-15
      • 2020-04-11
      • 1970-01-01
      • 2013-06-14
      • 1970-01-01
      相关资源
      最近更新 更多