R as.numeric 整数中的 Ascii 文件不正确答案

【问题标题】：Ascii file in R as.numeric integers are incorrectR as.numeric 整数中的 Ascii 文件不正确
【发布时间】：2014-11-01 11:11:00
【问题描述】：

我已将一个 ascii (.spe) 文件读入 R。该文件包含一列，主要是整数。但是 R 错误地解释了这些整数，可能是因为我没有指定正确的格式或类似的东西。该文件是在 Ortec Maestro 软件中生成的。代码如下：

library(SDMTools)
strontium<-read.table("C:/Users/Hal 2/Desktop/beta_spec/strontium 90 spectrum.spe",header=F,skip=2)
str_spc<-vector(mode="numeric")
for (i in 1:2037)
{
str_spc[i]<-as.numeric(strontium$V1[i+13])
}

例如，这里的 strontium$V1[14] 的值为 0，但 R 将其解释为 10。我想我可能需要将数据转换为其他格式或类似的格式，但我'我不确定，我可能在谷歌上搜索错误的搜索词。

这是文件的前几行：

$SPEC_ID:
No sample description was entered.
$SPEC_REM:
DET# 1
DETDESC# MCB 129
AP# Maestro Version 6.08
$DATE_MEA:
10/14/2014 15:13:16
$MEAS_TIM:
1516 1540
$DATA:
0 2047

这是文件的链接：https://www.dropbox.com/sh/y5x68jen487qnmt/AABBZyC6iXBY3e6XH0XZzc5ba?dl=0

任何帮助表示赞赏。

【问题讨论】：

你能从你的数据文件中举一个例子吗？
是的，我现在已经这样做了。感谢您的帮助。
可能也是strontium 90 spectrum.spe 文件的前几行（或文件的链接）。
谢谢@hrbrmstr 我现在已经做到了。首先链接是个好主意！
文件中的第一行和最后一行是字符串而不是数字。 read.table 将识别这一点并将所有条目视为字符串。因为默认是read.table(..., stringsAsFactors=TRUE)，所以所有字符都转换为因子。这就是为什么你会得到奇怪的数字。试试stringsAsFactors=FALSE 或完全跳过第一行/最后一行。

标签： r ascii

【解决方案1】：

我看到有人为 SPE Spectra 文件 in python 制作了一个解析器，如果没有至少一个功能最低的 R 版本，我就不能让它站稳脚跟，所以这里有一个解析一些字段，但可以让你得到你的数据：

library(stringr)
library(gdata)
library(lubridate)

read.spe <- function(file) {

  tmp <- readLines(file)

  tmp <- paste(tmp, collapse="\n")

  records <- strsplit(tmp, "\\$")[[1]]
  records <- records[records!=""]

  spe <- list()

  spe[["SPEC_ID"]] <- str_match(records[which(startsWith(records, "SPEC_ID"))],
                                "^SPEC_ID:[[:space:]]*([[:print:]]+)[[:space:]]+")[2]

  spe[["SPEC_REM"]] <- strsplit(str_match(records[which(startsWith(records, "SPEC_REM"))],
                                          "^SPEC_REM:[[:space:]]*(.*)")[2], "\n")

  spe[["DATE_MEA"]] <- mdy_hms(str_match(records[which(startsWith(records, "DATE_MEA"))],
                                         "^DATE_MEA:[[:space:]]*(.*)[[:space:]]$")[2])

  spe[["MEAS_TIM"]] <- strsplit(str_match(records[which(startsWith(records, "MEAS_TIM"))],
                                          "^MEAS_TIM:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]

  spe[["ROI"]] <- str_match(records[which(startsWith(records, "ROI"))],
                            "^ROI:[[:space:]]*(.*)[[:space:]]$")[2]

  spe[["PRESETS"]] <- strsplit(str_match(records[which(startsWith(records, "PRESETS"))],
                                         "^PRESETS:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]

  spe[["ENER_FIT"]] <- strsplit(str_match(records[which(startsWith(records, "ENER_FIT"))],
                                          "^ENER_FIT:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]

  spe[["MCA_CAL"]] <- strsplit(str_match(records[which(startsWith(records, "MCA_CAL"))],
                                         "^MCA_CAL:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]

  spe[["SHAPE_CAL"]] <- str_match(records[which(startsWith(records, "SHAPE_CAL"))],
                                  "^SHAPE_CAL:[[:space:]]*(.*)[[:space:]]*$")[2]

  spe_dat <- strsplit(str_match(records[which(startsWith(records, "DATA"))],
                                "^DATA:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]

  spe[["SPE_DAT"]] <- as.numeric(gsub("[[:space:]]", "", spe_dat)[-1])

  return(spe)

}

dat <- read.spe("strontium 90 spectrum.Spe")

str(dat)
## List of 10
##  $ SPEC_ID  : chr "No sample description was entered."
##  $ SPEC_REM :List of 1
##   ..$ : chr [1:3] "DET# 1" "DETDESC# MCB 129" "AP# Maestro Version 6.08"
##  $ DATE_MEA : POSIXct[1:1], format: "2014-10-14 15:13:16"
##  $ MEAS_TIM : chr "1516 1540"
##  $ ROI      : chr "0"
##  $ PRESETS  : chr [1:3] "None" "0" "0"
##  $ ENER_FIT : chr "0.000000 0.002529"
##  $ MCA_CAL  : chr [1:2] "3" "0.000000E+000 2.529013E-003 0.000000E+000 keV"
##  $ SHAPE_CAL: chr "3\n3.100262E+001 0.000000E+000 0.000000E+000"
##  $ SPE_DAT  : num [1:2048] 0 0 0 0 0 0 0 0 0 0 ...

head(dat$SPE_DAT)
## [1] 0 0 0 0 0 0

它需要一些润色，并且绝对没有错误检查（即对于缺少的字段），但今天没有时间处理这个问题。在接下来的几天里，我将完成解析并为它制作一个最小的包包装器。

【讨论】：