我认为它可以稍微概括一下,以便根据需要为调用函数留出一些空间来处理其他事情。我认为替换子字符串 就地 提供了一些有趣的功能。
这里有一个建议,它将用冗长的数字替换人类可读的数字,它们可能出现的次数与您传递给它的字符串一样多。
这肯定不会比您现有的解决方案更小或更快,但它可以以其他方式使用。
opp_humanReadable <- function(vec) {
known <- c(B = 1000, kB = 1e+06, MB = 1e+09, GB = 1e+12, TB = 1e+15, PB = 1e+18,
EB = 1e+21, ZB = 1e+24, YB = 1e+27, KiB = 1048576, MiB = 1073741824,
GiB = 1099511627776, TiB = 1125899906842624, PiB = 1152921504606846976,
EiB = 1.18059162071741e+21, ZiB = 1.20892581961463e+24, YiB = 1.23794003928538e+27,
b = 1024, Kb = 1048576, Mb = 1073741824, Gb = 1099511627776,
Tb = 1125899906842624, Pb = 1152921504606846976, KB = 1048576
)
ptn <- paste0(
"(-?\\d+\\.?\\d*|\\d*\\.?\\d)",
"\\s*",
"(", paste0(names(known), collapse = "|"), ")\\b")
gre <- gregexpr(ptn, vec)
matches <- regmatches(vec, gre)
unit <- lapply(matches, gsub, pattern = "^[-.0-9]*\\s*", replacement = "")
rest <- lapply(matches, gsub, pattern = "^[-.0-9]*(\\s*)\\S*$", replace = "\\1")
num <- lapply(matches, gsub, pattern = "[^-.0-9]", replacement = "")
newnum <- Map(function(a, p) {
if (length(a)) {
sapply(as.numeric(a) * known[p], format, scientific = FALSE)
} else character(0)
}, num, unit)
regmatches(vec, gre) <- Map(paste0, newnum, rest, unit)
vec
}
vec <- c('100.1 MB 2 KiB', '100.1MB', 'foo -100.1 MB quux', '9 kB', '10 kB', '9 xx',
'.2 GiB', 'hello -.2PB world')
opp_humanReadable(vec)
# [1] "100100000000 MB 2097152 KiB" "100100000000MB"
# [3] "foo -100100000000 MB quux" "9000000 kB"
# [5] "10000000 kB" "9 xx"
# [7] "219902325555 GiB" "hello -200000000000000000PB world"
它试图在数字/单位内和周围保留空格。
如果你好奇,我将known 推导出来
# adapted from utils:::format.object_size
known_units <- list(
SI = c("B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"),
IEC = c("B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"),
legacy = c("b", "Kb", "Mb", "Gb", "Tb", "Pb"),
LEGACY = c("B", "KB", "MB", "GB", "TB", "PB"))
known_bases <- c(SI = 1000, IEC = 1024, legacy = 1024, LEGACY = 1024)
known <- Map(function(un, ba) setNames(ba^(seq_along(un)), un),
known_units, known_bases)
for (i in seq_along(known)[-1]) {
nms <- names(known[[i]])
known[[i]] <- known[[i]][ nms[ ! nms %in% unlist(lapply(known[1:(i-1)], names)) ] ]
}
known <- unlist(unname(known))
也许是 Kludgy,但我知道如果我不以编程方式执行它,我会错过一个逗号或其他东西。
此函数的扩展可能会接受一些format 类似的参数,例如big.mark=、small.mark= 等。更好的是,作为“查找”数字的伴随函数(据称在调用此函数之后)和插入逗号等。