【问题标题】:Automatically - "Convert numbers stored as text to numbers"自动 - “将存储为文本的数字转换为数字”
【发布时间】:2018-08-27 13:53:28
【问题描述】:

让我们考虑这个小例子:

df1<- data.frame(A=c(1,NA,"pvalue",0.0003),B=c(0.5,7,"I destroy","numbers all day"),stringsAsFactors = T)

写入文件:

openxlsx::write.xlsx(df1,"Test.xlsx")

在我生成的 excel 文件中,17 是文本单元格。 Excel 具有“直觉”,即它们是以文本形式存储的数字。我可以手动转换它们。

如何将这些“标记”值自动转换为 R 内部的数字?

在“我想要什么”中,我手动将 TEXT 转换为数字。这是“我得到了什么”部分中“绿色三角形”后面的一个选项(红色箭头)。

@Roland 的评论:重新排列为列表不起作用。

df1<- as.data.frame(cbind(A=list(1,NA_real_,"pvalue",0.0003),B=list(0.5,7,"I destroy","numbers all day")))
openxlsx::write.xlsx(df1,"Test2.xlsx")

【问题讨论】:

  • 您将无法将 1 和 7 转换为数字,因为它们位于 char 变量中
  • 也许我可以用不同的方式设置我的 df1。
  • 我认为您需要确保列始终包含相同的数据类型。这意味着将所有数字存储在一列中,将文本存储在另一列中。
  • 也许我对“报告”这个词并不准确。它是我提供的输出,应该看起来不错并且具有有意义的细胞类型,其他人可以直接继续工作。
  • 在 data.frame 中,每一列只能包含一种数据类型。如果这对您不起作用,则不能使用 data.frame。您可以改用列表列表(这不太方便)。我同意上面的评论,重新排列你的数据。数据分析和数据报告是两个不同的任务。你不能让后者限制你如何做前者。

标签: r excel openxlsx


【解决方案1】:

我根据@Roland 和@phiver 的建议编写了一小段代码。它以整洁的data.frame 开头(保留每个单元格的数据类型)并一一保存值:

library(openxlsx)
df1<- as.data.frame(cbind(A=list(1,NA_real_,"pvalue",0.0003),B=list(0.5,7,"I destroy","numbers all day")))

wb <- createWorkbook()
sheet.name <- 'test'
addWorksheet(wb, sheet.name)

for(i in seq_along(df1)){
    writeData(wb, sheet = sheet.name, names(df1)[i], startCol = i, startRow = 1)
    icol <- df1[[i]]
    for(j in seq_along(icol)){
        x <- icol[[j]]
        writeData(wb, sheet = sheet.name, x, startCol = i, startRow = j + 1)
    }
}
saveWorkbook(wb, file = "Test.xlsx")

希望这适用于您的数据。

【讨论】:

    【解决方案2】:

    感谢 @mt1022 添加了验证器,让 000123 留在帮助函数部分中的 000123

    一个可以做openxlsx::write.xlsx() 可以做的+“寻找有意义的类型”的解决方案。

    功能:(其 98% openxlsx::write.xlsx

    writeXlsxWithTypes <- function(x, file, asTable = FALSE, ...) {
        library(magrittr);library(openxlsx);
    
        if(T) {
            setTypes <- function(x) {
                x %<>%
                    lapply(function(xX){
                        lapply(xX ,function(u) {
                            if(canConvert(u)) { type.convert(as.character(u), as.is = TRUE) } else { u }
                        })
                    }) %>% do.call(cbind, .) %>% as.data.frame
            } #types fun
    
            validateBorderStyle <- function(borderStyle){
    
    
                valid <- c("none", "thin", "medium", "dashed", "dotted", "thick", "double", "hair", "mediumDashed", 
                           "dashDot", "mediumDashDot", "dashDotDot", "mediumDashDotDot", "slantDashDot")
    
                ind <- match(tolower(borderStyle), tolower(valid))
                if(any(is.na(ind)))
                    stop("Invalid borderStyle", call. = FALSE)
    
                return(valid[ind])
    
            }
    
            validateColour <- function(colour, errorMsg = "Invalid colour!"){
    
                ## check if
                if(is.null(colour))
                    colour = "black"
    
                validColours <- colours()
    
                if(any(colour %in% validColours))
                    colour[colour %in% validColours] <- col2hex(colour[colour %in% validColours])
    
                if(any(!grepl("^#[A-Fa-f0-9]{6}$", colour)))
                    stop(errorMsg, call.=FALSE)
    
                colour <- gsub("^#", "FF", toupper(colour))
    
                return(colour)
    
            }
            #x="0001"
            canConvert <- function(x) {
                return( !grepl("^0+\\.?\\d",x) )
                }
        } # define helper functions
    
        if(T) {
            params <- list(...)
            if (!is.logical(asTable)) 
                stop("asTable must be a logical.")
            creator <- ifelse("creator" %in% names(params), params$creator, 
                              "")
            title <- params$title
            subject <- params$subject
            category <- params$category
            sheetName <- "Sheet 1"
            if ("sheetName" %in% names(params)) {
                if (any(nchar(params$sheetName) > 31)) 
                    stop("sheetName too long! Max length is 31 characters.")
                sheetName <- as.character(params$sheetName)
                if ("list" %in% class(x) & length(sheetName) == length(x)) 
                    names(x) <- sheetName
            }
            tabColour <- NULL
            if ("tabColour" %in% names(params)) 
                tabColour <- validateColour(params$tabColour, "Invalid tabColour!")
            zoom <- 100
            if ("zoom" %in% names(params)) {
                if (is.numeric(params$zoom)) {
                    zoom <- params$zoom
                }
                else {
                    stop("zoom must be numeric")
                }
            }
            gridLines <- TRUE
            if ("gridLines" %in% names(params)) {
                if (all(is.logical(params$gridLines))) {
                    gridLines <- params$gridLines
                }
                else {
                    stop("Argument gridLines must be TRUE or FALSE")
                }
            }
            overwrite <- TRUE
            if ("overwrite" %in% names(params)) {
                if (is.logical(params$overwrite)) {
                    overwrite <- params$overwrite
                }
                else {
                    stop("Argument overwrite must be TRUE or FALSE")
                }
            }
            withFilter <- TRUE
            if ("withFilter" %in% names(params)) {
                if (is.logical(params$withFilter)) {
                    withFilter <- params$withFilter
                }
                else {
                    stop("Argument withFilter must be TRUE or FALSE")
                }
            }
            startRow <- 1
            if ("startRow" %in% names(params)) {
                if (all(startRow > 0)) {
                    startRow <- params$startRow
                }
                else {
                    stop("startRow must be a positive integer")
                }
            }
            startCol <- 1
            if ("startCol" %in% names(params)) {
                if (all(startCol > 0)) {
                    startCol <- params$startCol
                }
                else {
                    stop("startCol must be a positive integer")
                }
            }
            colNames <- TRUE
            if ("colNames" %in% names(params)) {
                if (is.logical(params$colNames)) {
                    colNames <- params$colNames
                }
                else {
                    stop("Argument colNames must be TRUE or FALSE")
                }
            }
            if ("col.names" %in% names(params)) {
                if (is.logical(params$col.names)) {
                    colNames <- params$col.names
                }
                else {
                    stop("Argument col.names must be TRUE or FALSE")
                }
            }
            rowNames <- FALSE
            if ("rowNames" %in% names(params)) {
                if (is.logical(params$rowNames)) {
                    rowNames <- params$rowNames
                }
                else {
                    stop("Argument colNames must be TRUE or FALSE")
                }
            }
            if ("row.names" %in% names(params)) {
                if (is.logical(params$row.names)) {
                    rowNames <- params$row.names
                }
                else {
                    stop("Argument row.names must be TRUE or FALSE")
                }
            }
            xy <- NULL
            if ("xy" %in% names(params)) {
                if (length(params$xy) != 2) 
                    stop("xy parameter must have length 2")
                xy <- params$xy
            }
            headerStyle <- NULL
            if ("headerStyle" %in% names(params)) {
                if (length(params$headerStyle) == 1) {
                    if ("Style" %in% class(params$headerStyle)) {
                        headerStyle <- params$headerStyle
                    }
                    else {
                        stop("headerStyle must be a style object.")
                    }
                }
                else {
                    if (all(sapply(params$headerStyle, function(x) "Style" %in% 
                                   class(x)))) {
                        headerStyle <- params$headerStyle
                    }
                    else {
                        stop("headerStyle must be a style object.")
                    }
                }
            }
            borders <- NULL
            if ("borders" %in% names(params)) {
                borders <- tolower(params$borders)
                if (!all(borders %in% c("surrounding", "rows", "columns", 
                                        "all"))) 
                    stop("Invalid borders argument")
            }
            borderColour <- getOption("openxlsx.borderColour", "black")
            if ("borderColour" %in% names(params)) 
                borderColour <- params$borderColour
            borderStyle <- getOption("openxlsx.borderStyle", "thin")
            if ("borderStyle" %in% names(params)) {
                borderStyle <- validateBorderStyle(params$borderStyle)
            }
            keepNA <- FALSE
            if ("keepNA" %in% names(params)) {
                if (!"logical" %in% class(keepNA)) {
                    stop("keepNA must be a logical.")
                }
                else {
                    keepNA <- params$keepNA
                }
            }
            tableStyle <- "TableStyleLight9"
            if ("tableStyle" %in% names(params)) 
                tableStyle <- params$tableStyle
            colWidths <- ""
            if ("colWidths" %in% names(params)) 
                colWidths <- params$colWidths
        } # params check
    
        if(class(x) == "data.frame") {
            x %<>% setTypes %>% list
        } else {
            lNames <- names(x)
            x %<>% lapply(setTypes)
        }
    
        if(T) {   
            nms <- names(x)
            nSheets <- length(x)
            if (is.null(nms)) {
                nms <- paste("Sheet", 1:nSheets)
            }
            else if (any("" %in% nms)) {
                nms[nms %in% ""] <- paste("Sheet", (1:nSheets)[nms %in% 
                                                                   ""])
            }
            else {
                nms <- make.unique(nms)
            }
            if (any(nchar(nms) > 31)) {
                warning("Truncating list names to 31 characters.")
                nms <- substr(nms, 1, 31)
            }
            if (!is.null(tabColour)) {
                if (length(tabColour) != nSheets) 
                    tabColour <- rep_len(tabColour, length.out = nSheets)
            }
            if (length(zoom) != nSheets) 
                zoom <- rep_len(zoom, length.out = nSheets)
            if (length(gridLines) != nSheets) 
                gridLines <- rep_len(gridLines, length.out = nSheets)
            if (length(withFilter) != nSheets) 
                withFilter <- rep_len(withFilter, length.out = nSheets)
            if (length(colNames) != nSheets) 
                colNames <- rep_len(colNames, length.out = nSheets)
            if (length(rowNames) != nSheets) 
                rowNames <- rep_len(rowNames, length.out = nSheets)
            if (length(startRow) != nSheets) 
                startRow <- rep_len(startRow, length.out = nSheets)
            if (length(startCol) != nSheets) 
                startCol <- rep_len(startCol, length.out = nSheets)
            if (!is.null(headerStyle)) 
                headerStyle <- lapply(1:nSheets, function(x) return(headerStyle))
            if (length(borders) != nSheets & !is.null(borders)) 
                borders <- rep_len(borders, length.out = nSheets)
            if (length(borderColour) != nSheets) 
                borderColour <- rep_len(borderColour, length.out = nSheets)
            if (length(borderStyle) != nSheets) 
                borderStyle <- rep_len(borderStyle, length.out = nSheets)
            if (length(keepNA) != nSheets) 
                keepNA <- rep_len(keepNA, length.out = nSheets)
            if (length(asTable) != nSheets) 
                asTable <- rep_len(asTable, length.out = nSheets)
            if (length(tableStyle) != nSheets) 
                tableStyle <- rep_len(tableStyle, length.out = nSheets)
            if (length(colWidths) != nSheets) 
                colWidths <- rep_len(colWidths, length.out = nSheets)
        }  # setup and validation
    
        wb <- openxlsx::createWorkbook(creator = creator, title = title, subject = subject, 
                             category = category)
    
        for (i in 1:nSheets) {
    
            if(T) {
    
                wb$addWorksheet(nms[[i]], showGridLines = gridLines[i], 
                                tabColour = tabColour[i], zoom = zoom[i])
                if (asTable[i]) {
    
                    for(ii in seq_along(x[[i]])){
                        openxlsx::writeDataTable(wb = wb, sheet = i, x = names(x[[i]])[[ii]],
                                                 startCol = ii, startRow = 1, 
                                                 xy = xy, colNames = colNames[[i]], rowNames = rowNames[[i]], 
                                                 tableStyle = tableStyle[[i]], tableName = NULL, 
                                                 headerStyle = headerStyle[[i]], withFilter = withFilter[[i]], 
                                                 keepNA = keepNA[[i]]
                                                 )
                        icol <- x[[i]][[ii]]
    
                        for(j in seq_along(icol)){
                            dati <- icol[[j]]
    
                            openxlsx::writeData(wb = wb, sheet = i,x = dati,
                                                startCol = ii, startRow = j+1, 
                                                xy = xy, colNames = colNames[[i]], rowNames = rowNames[[i]], 
                                                tableStyle = tableStyle[[i]], tableName = NULL, 
                                                headerStyle = headerStyle[[i]], withFilter = withFilter[[i]], 
                                                keepNA = keepNA[[i]]
                                                )
                        }
                    }
                }
                else {
    
                    for(ii in seq_along(x[[i]])){
    
                        openxlsx::writeData(wb = wb, sheet = i, x = names(x[[i]])[[ii]],
                                            startCol = ii, startRow = 1,
                                            xy = xy, colNames = colNames[[i]], rowNames = rowNames[[i]],
                                            headerStyle = headerStyle[[i]],
                                            borders = borders[[i]], borderColour = borderColour[[i]], borderStyle = borderStyle[[i]],
                                            keepNA = keepNA[[i]]
                        )
                        icol <- x[[i]][[ii]]
    
                        for(j in seq_along(icol)){
                            dati <- icol[[j]]
    
                            openxlsx::writeData(wb = wb, sheet = i,x = dati,
                                                startCol = ii, startRow = j+1, 
                                                xy = xy, colNames = colNames[[i]], rowNames = rowNames[[i]],
                                                headerStyle = headerStyle[[i]],
                                                borders = borders[[i]], borderColour = borderColour[[i]], borderStyle = borderStyle[[i]],
                                                keepNA = keepNA[[i]]
                            )
                        }
                    }
                }
                if (colWidths[i] %in% "auto") 
                    setColWidths(wb, sheet = i, cols = 1:ncol(x[[i]]) + 
                                     startCol[[i]] - 1L, widths = "auto")
    
                } #from list
    
    
    
        }
    
        if(T) {
            freezePanes <- FALSE
            firstActiveRow <- rep_len(1L, length.out = nSheets)
            if ("firstActiveRow" %in% names(params)) {
                firstActiveRow <- params$firstActiveRow
                freezePanes <- TRUE
                if (length(firstActiveRow) != nSheets) 
                    firstActiveRow <- rep_len(firstActiveRow, length.out = nSheets)
            }
            firstActiveCol <- rep_len(1L, length.out = nSheets)
            if ("firstActiveCol" %in% names(params)) {
                firstActiveCol <- params$firstActiveCol
                freezePanes <- TRUE
                if (length(firstActiveCol) != nSheets) 
                    firstActiveCol <- rep_len(firstActiveCol, length.out = nSheets)
            }
            firstRow <- rep_len(FALSE, length.out = nSheets)
            if ("firstRow" %in% names(params)) {
                firstRow <- params$firstRow
                freezePanes <- TRUE
                if ("list" %in% class(x) & length(firstRow) != nSheets) 
                    firstRow <- rep_len(firstRow, length.out = nSheets)
            }
            firstCol <- rep_len(FALSE, length.out = nSheets)
            if ("firstCol" %in% names(params)) {
                firstCol <- params$firstCol
                freezePanes <- TRUE
                if ("list" %in% class(x) & length(firstCol) != nSheets) 
                    firstCol <- rep_len(firstCol, length.out = nSheets)
            }
            if (freezePanes) {
                for (i in 1:nSheets) openxlsx::freezePane(wb = wb, sheet = i, 
                                                firstActiveRow = firstActiveRow[i], firstActiveCol = firstActiveCol[i], 
                                                firstRow = firstRow[i], firstCol = firstCol[i])
            }
        } # additional settings/Options
    
        openxlsx::saveWorkbook(wb = wb, file = file, overwrite = overwrite)
    
        return(invisible(NULL))
    }
    

    示例数据:

    df1 <- mtcars
    
    df1[1,3]<-"ID =====>"
    df1[1,4]<-"00000123"
    df1[3,7]<-NA
    df1[2,6]<-"stringi"
    
    ldf <- list(NOW=df1, WITH=df1, LISTS=df1)
    

    致电:

    writeXlsxWithTypes(df1, "test_normal3.xlsx" , rowNames = TRUE, borders = "surrounding")
    writeXlsxWithTypes(ldf, "test_list3.xlsx", rowNames = TRUE, borders = "surrounding")
    

    【讨论】:

    • 不错的尝试。需要注意的是type.convert 并不总是可取的。例如,如果我有一个 ID 号字符串,如 "00001230",要写入 excel 文件,type.convert 会将其转换为整数 1230。但是,自动转换没有任何意义。
    【解决方案3】:

    以防万一它对其他人有所帮助,我导入了一个 excel 文档,对数据框进行了一系列操作,然后将其写为一个新的 excel 文档。我不想把从 char 到 numeric 的转换放在 dataframe 中,因为它会弄乱我现有的代码,所以我把它放在 writeData 位中。

    wb <- createWorkbook()
    lapply(listOfDFs, function(x) addWorksheet(wb, sheetName = x))   
    for (n in 1:length(listOfDFs)) {
      sheet <- allDFs[[n]]
      for (row in 1:nrow(sheet)){
        sheetRow <- data.frame(lapply(sheet[row,], function(x){type.convert(as.character(x))}), check.names = FALSE, stringsAsFactors = FALSE)
        if (row == 1) {
          writeData(wb, sheet = n, x = sheetRow, startRow = row, colNames = TRUE)
        } else {
          writeData(wb, sheet = n, x = sheetRow, startRow = row+1, colNames = FALSE)
        }
      }
    }
    saveWorkbook(wb, file = "test.xlsx", overwrite = TRUE)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多