如何将.txt文件中的字符串列表转换为数据框答案

【问题标题】：How to convert list of strings in .txt file into a dataframe如何将.txt文件中的字符串列表转换为数据框
【发布时间】：2019-11-25 03:38:27
【问题描述】：

我有一个从 .txt 文件中读取的值列表，并试图将其转换为 R 中的数据框：

.txt 数据：

l_arr(0, 1, 1) = 0;
l_dep(0, 1, 1) = 7.36639;
r_arr(0, 1, 1) = 0;
r_dep(0, 1, 1) = 0;
l_arr(0, 1, 2) = 51.9099;
l_dep(0, 1, 2) = 51.9099;
r_arr(0, 1, 2) = 0.4;
r_dep(0, 1, 2) = 0.4;

R中对应的数据框：

我目前有这个：

df <- data.frame(matrix(ncol = 5))
x <- c("Type", "Angle", "Row", "Boundary", "Timestamp")
colnames(df) <- x

data<-read.csv("SWV_data.txt", header=TRUE, sep = ",")
data<-as.character(data)
temp<-(unlist(strsplit(data,"(")))

我正在努力处理文本字符串，因为一旦我使用 as.character，.txt 中数据的整个结构就会丢失

【问题讨论】：

您需要提供样本数据，而不是图像。您可以使用复制 .txt 数据，将其粘贴到此处。

标签： r string dataframe

【解决方案1】：

您可以尝试使用 readLines 读取文本文件，使用 gsub 将所有多余的字符 (,()=;) 替换为空白，按空格拆分以获得不同的列。使用type.convert 将列转换为各自的类型。

output <- as.data.frame(do.call(rbind, strsplit(gsub("[,()=;]", " ", 
                        readLines("demo.txt")), "\\s+")))
output <- type.convert(output)
names(output) <- c("Type", "Angle", "Row", "Boundary", "TimeStam")

output
#   Type Angle Row Boundary TimeStam
#1 l_arr     0   1        1     0.00
#2 l_dep     0   1        1     7.37
#3 r_arr     0   1        1     0.00
#4 r_dep     0   1        1     0.00
#5 l_arr     0   1        2    51.91
#6 l_dep     0   1        2    51.91
#7 r_arr     0   1        2     0.40
#8 r_dep     0   1        2     0.40

【讨论】：

【解决方案2】：

您可以使用readLines 然后删除所有不必要的字符：

nm <- c("Type", "Angle", "Row", "Boundary", "TimeStam")
read.table(text=sub('_',',',gsub('[^A-Z0-9.a-z_]',' ',readLines("a.txt"))),col.names = nm)
   Type Angle Row Boundary TimeStam
1 l,arr     0   1        1  0.00000
2 l,dep     0   1        1  7.36639
3 r,arr     0   1        1  0.00000
4 r,dep     0   1        1  0.00000
5 l,arr     0   1        2 51.90990
6 l,dep     0   1        2 51.90990
7 r,arr     0   1        2  0.40000
8 r,dep     0   1        2  0.40000

【讨论】：

【解决方案3】：

如果你想通过正则表达式匹配来创建每一列，你可以使用tidyr::extract和捕获组来匹配每一列对应的文本类型。在此示例中，您开始使用的文件结构良好，但在其他情况下可能效果不佳。

txt <- readLines("data.txt")
tidyr::extract(data.frame(txt), txt, 
               into = c("Type", "Angle", "Row", "Boundary", "TimeStam"),
               regex = "(^\\w+)\\((\\d+), (\\d+), (\\d+)\\) = ([\\d.]+);$")

请注意，这不会将每列更改为字符串；如果你需要改变它，像dplyr::mutate_at(vars(-Type), as.numeric) 这样的调用会很快完成转换。

【讨论】：