【问题标题】:R function that makes several changes to multiple columns对多列进行多次更改的 R 函数
【发布时间】:2020-02-15 11:29:58
【问题描述】:

每当我在数据仓库上运行查询或从同事那里接收数据时,我都会获得以下格式的数据:

df <- structure(list(Year = c(1990, 1991, 1992, 1993), Company = structure(1:4, .Label = c("A", 
"B", "C", "D"), class = "factor"), Sales = c("100,1", "101,1", 
"102,2", "103,3"), Revenue = c("100,1", "101,1", "102,2", "103,3"
)), row.names = c(NA, -4L), class = "data.frame")

我必须做的第一件事是将“,”替换为“。”并转换为数字。到目前为止,我已经这样做了:

df$Sales <- gsub(",", ".", df$Sales)
df$Sales <- as.numeric(df$Sales)

当有多个列要转换并且它们在不同情况下具有不同的名称时,这会变得很烦人。因此,在第 n 个数据集上第 n 次执行此操作后,我决定从头开始编写我的第一个 R 函数。 我最初的尝试分两步完成,并且成功了!

# Columns to convert
columns_names <- c("Revenue", "Sales")

# Function to convert "," to "."
convert_to_dot <- function(x, column){
  for (i in column){
    df[[i]] <- gsub(",", ".", x[[i]])
  }
  df
}

# Function to convert to numeric
convert_to_numeric <- function(x, column){
  for (i in column){
    df[[i]] <- as.numeric(as.character(x[[i]]))
  }
  df
}
df <- convert_to_dot(df, columns_names)
df <- convert_to_numeric(df, columns_names)

structure(list(Year = c(1990, 1991, 1992, 1993), Company = structure(1:4, .Label = c("A", 
"B", "C", "D"), class = "factor"), Sales = c(100.1, 101.1, 102.2, 
103.3), Revenue = c(100.1, 101.1, 102.2, 103.3)), row.names = c(NA, 
-4L), class = "data.frame")

但是,当我尝试将它们组合成一个函数时,没有任何效果。我尝试了以下几个版本,但它们都倾向于产生相似的结果,其中列中的值被替换为 NA,或者整个数据框变成 NULL 值。

# Function to replace "," with "." and convert to numeric
convert_dot_numeric <- function(x, column){
  for (i in column){
    df[[i]] <- gsub(",", ".", x[[i]])
    df[[i]] <- as.numeric(as.character(x[[i]]))
  }
  df
}

structure(list(Year = c(1990, 1991, 1992, 1993), Company = structure(1:4, .Label = c("A", 
"B", "C", "D"), class = "factor"), Sales = c(NA_real_, NA_real_, 
NA_real_, NA_real_), Revenue = c(NA_real_, NA_real_, NA_real_, 
NA_real_)), row.names = c(NA, -4L), class = "data.frame")

我猜 for 循环不是最有效的方法,但谁能给我一个提示如何做到这一点? 提前致谢!

【问题讨论】:

    标签: r function


    【解决方案1】:

    您需要将 x[[i]] 替换为 df[[i]],因为您已经使用 gsub 将逗号更改为 dot gsub。所以你的功能应该是:

    使用OP方法(修正函数):

    # Function to replace "," with "." and convert to numeric
    convert_dot_numeric <- function(x, column){
        for (i in column){
            df[[i]] <- gsub(",", ".", x[[i]])
            df[[i]] <- as.numeric(as.character(df[[i]]))
        }
        df
    }
    
    convert_dot_numeric(df,c('Sales', 'Revenue') )
    

    使用 lapply:

    您也可以使用 lapply 来执行此操作,如下所示:

    df[,c('Sales', 'Revenue')] <- lapply(df[,c('Sales', 'Revenue')], function(x)as.numeric(gsub(',', '.', x))) 
    

    lapply 将迭代到您作为输入提供的所有必要列,然后 gsub 可以替换逗号以迭代地点。你可以完全避免这里的 for 循环(如果你需要的话)。

    输出

    #  Year Company Sales Revenue
    #1 1990       A 100.1   100.1
    #2 1991       B 101.1   101.1
    #3 1992       C 102.2   102.2
    #4 1993       D 103.3   103.3
    

    希望对你有帮助

    【讨论】:

      【解决方案2】:

      我会使用readr::parse_number() 并将小数点设置为逗号。您可以使用dplyr::mutate_at() 将更改应用于多个变量。

      library(readr)
      library(dplyr)
      
      df %>%
        mutate_at(vars(Sales, Revenue), parse_number, locale = locale(decimal_mark = ","))
      
        Year Company Sales Revenue
      1 1990       A 100.1   100.1
      2 1991       B 101.1   101.1
      3 1992       C 102.2   102.2
      4 1993       D 103.3   103.3
      

      如果你愿意,你可以把它包装在一个函数中:

      treat_commas <- function(data, ...) {
      
      data %>%
        mutate_at(vars(...), parse_number, locale = locale(decimal_mark = ","))
      
      }
      
      treat_commas(df, Sales, Revenue)
      
        Year Company Sales Revenue
      1 1990       A 100.1   100.1
      2 1991       B 101.1   101.1
      3 1992       C 102.2   102.2
      4 1993       D 103.3   103.3 
      

      【讨论】:

        猜你喜欢
        • 2021-11-05
        • 2018-08-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多