【发布时间】:2020-02-15 11:29:58
【问题描述】:
每当我在数据仓库上运行查询或从同事那里接收数据时,我都会获得以下格式的数据:
df <- structure(list(Year = c(1990, 1991, 1992, 1993), Company = structure(1:4, .Label = c("A",
"B", "C", "D"), class = "factor"), Sales = c("100,1", "101,1",
"102,2", "103,3"), Revenue = c("100,1", "101,1", "102,2", "103,3"
)), row.names = c(NA, -4L), class = "data.frame")
我必须做的第一件事是将“,”替换为“。”并转换为数字。到目前为止,我已经这样做了:
df$Sales <- gsub(",", ".", df$Sales)
df$Sales <- as.numeric(df$Sales)
当有多个列要转换并且它们在不同情况下具有不同的名称时,这会变得很烦人。因此,在第 n 个数据集上第 n 次执行此操作后,我决定从头开始编写我的第一个 R 函数。 我最初的尝试分两步完成,并且成功了!
# Columns to convert
columns_names <- c("Revenue", "Sales")
# Function to convert "," to "."
convert_to_dot <- function(x, column){
for (i in column){
df[[i]] <- gsub(",", ".", x[[i]])
}
df
}
# Function to convert to numeric
convert_to_numeric <- function(x, column){
for (i in column){
df[[i]] <- as.numeric(as.character(x[[i]]))
}
df
}
df <- convert_to_dot(df, columns_names)
df <- convert_to_numeric(df, columns_names)
structure(list(Year = c(1990, 1991, 1992, 1993), Company = structure(1:4, .Label = c("A",
"B", "C", "D"), class = "factor"), Sales = c(100.1, 101.1, 102.2,
103.3), Revenue = c(100.1, 101.1, 102.2, 103.3)), row.names = c(NA,
-4L), class = "data.frame")
但是,当我尝试将它们组合成一个函数时,没有任何效果。我尝试了以下几个版本,但它们都倾向于产生相似的结果,其中列中的值被替换为 NA,或者整个数据框变成 NULL 值。
# Function to replace "," with "." and convert to numeric
convert_dot_numeric <- function(x, column){
for (i in column){
df[[i]] <- gsub(",", ".", x[[i]])
df[[i]] <- as.numeric(as.character(x[[i]]))
}
df
}
structure(list(Year = c(1990, 1991, 1992, 1993), Company = structure(1:4, .Label = c("A",
"B", "C", "D"), class = "factor"), Sales = c(NA_real_, NA_real_,
NA_real_, NA_real_), Revenue = c(NA_real_, NA_real_, NA_real_,
NA_real_)), row.names = c(NA, -4L), class = "data.frame")
我猜 for 循环不是最有效的方法,但谁能给我一个提示如何做到这一点? 提前致谢!
【问题讨论】: