【发布时间】:2022-01-06 12:47:00
【问题描述】:
我已经导入了不同年份的具有相同变量的各种数据集。我正在尝试将某些列从因子转换为数字。为了节省时间,我创建了一个似乎不起作用的函数。
我创建了一个列表,其中数据集的名称为字符串
dfs <- list("df1", "df2", "df3", "df4", "df5", "df6", "df7", "df8")
还有第二个列表,其中变量(列)的名称也是字符串
vars <- list("var1", "var2", "var3", "var4")
首先我尝试用中间的“$”连接两个列表,然后将函数传递给将因子转换为数字:
to_int <- function(column){
if (is.factor(column)){
column <-levels(column)[column]
column<-as.numeric(column)
return(column)
}
else{
return(column)
}
}
选项 1:创建一个带有由 $ 连接的字符串的向量
col_names <- vector(mode = "list", length = length(dfs))
# Add the combination of names to each vector
for (df in dfs) {
for (var in vars){
r <- paste(df, var, sep = "$") # Combine the names in the 2 lists with a $ in the middle
col_names[[match(df, dfs)]][match(var, vars)] <- r # Assign result to the pre-set vector
}
}
# Iterate through list (col_names) and apply "to_int" to each of the strings in the list
for (l in col_names){
for (col_name in l){
colnm <- eval(parse(text = col_name))
nmrc <- to_int(colnm) # from factor to numeric each column. Works!
assign(col_name, nmrc, envir = globalenv()) # Creates values (Rstudio) with the correct name but columns on dfs remain intact
}
}
然后我尝试分别处理两个列表中的字符串并将它们放在循环中:
选项 2:将列表视为单独的字符串并加入循环
for (df in dfs) {
for (var in vars){
a <- eval(parse(text = df))
b <- to_int(a[var]) # using $ returns null. using [] no change in original df, still factor
a[var] <- b
}
}
我终于尝试创建一个以变量为输入的新函数:
# with two inputs
to_int2 <- function(df, col){
eval(parse(text = df))
if (is.factor(df[col])){ # $ OPERATOR IS INVALID FOR ATOMIC VECTORS
df[col] <-levels(df[col])[df[col]]
df[col]<-as.numeric(df[col])
return(df[col])
}
else{
return(df[col])
}
}
并通过了第三次尝试
选项 3:使用两个输入将因子转换为数值
for (df in dfs) {
for (var in vars){
a <- to_int2(df, var) # $ OPERATOR IS INVALID FOR ATOMIC VECTORS
b <- eval(parse(text = df))
b$var <- a # No effect
}
}
它们都不会对数据框的所需列产生影响。 关于如何解决这个问题的任何想法? 谢谢
【问题讨论】: