【问题标题】:how to split column by | into multiple columns [duplicate]如何拆分列 |分成多列[重复]
【发布时间】:2013-12-21 19:24:27
【问题描述】:

在 R 中: 我有一个多行但只有一列的数据框。每行都有一长串字符,周期性地用 |标记。每次有 | 时我都想拆分字符标记,这样就有很多列。

1995-01-01|33.399999999999999|40.299999999999997|35.399999999999999|35.0|37.200000000000003|23.399999999999999|23.199999999999999|47.399999999999999|49.200000000000003|49.200000000000003|48.100000000000001|42.299999999999997|58.200000000000003|17.399999999999999|50.700000000000003|5.2999999999999998|20.600000000000001|38.5|43.299999999999997 etc.

每个字符串都以日期开头,然后是对应城市的数字。变量名也列成一个字符串,需要用“.”隔开。标记。

date.abilene_tx.akron_oh.albany_ny.albuquerque_nm.allentown_pa.amarillo_tx.anchorage_ak.asheville_nc.atlanta_ga etc.

非常感谢任何帮助!

【问题讨论】:

  • strsplit 可能,但是您是如何将其放入 R 中的?如果您从文件中读取它,您可能想查看read.tablesep 参数。

标签: regex r dataframe strsplit


【解决方案1】:

你应该已经用这个命令从文件中加载了数据:

 dat <- read.table(filename, sep="|")

这将处理用“|”分隔的行但是你又说“字符串”用“。”分隔,所以如果这些以某种方式混合在 htat 文本文件中,您可能需要先使用readLines() 对输入进行一些预处理。

【讨论】:

    【解决方案2】:

    这是一个包含一列和 10 行的 data.frame,可能与您的相似:

    dat <- "1995-01-01|33.399999999999999|40.299999999999997|35.399999999999999|35.0|37.200000000000003|23.399999999999999|23.199999999999999|47.399999999999999|49.200000000000003|49.200000000000003|48.100000000000001|42.299999999999997|58.200000000000003|17.399999999999999|50.700000000000003|5.2999999999999998|20.600000000000001|38.5|43.299999999999997 "
    
    df <- data.frame(col1 = rep(dat, 10))
    

    这是具有基于拆分 Col1 的新列的 data.frame:

    foo <- data.frame(do.call('rbind', strsplit(as.character(df$col1),'|',fixed=TRUE)))
    foo
    
               X1                 X2                 X3                 X4   X5                 X6
    1  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
    2  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
    3  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
    4  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
    5  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
    6  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
    7  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
    8  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
    9  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
    10 1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
                       X7                 X8                 X9                X10                X11
    1  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
    2  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
    3  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
    4  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
    5  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
    6  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
    7  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
    8  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
    9  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
    10 23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
                      X12                X13                X14                X15                X16
    1  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
    2  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
    3  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
    4  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
    5  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
    6  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
    7  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
    8  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
    9  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
    10 48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
                      X17                X18  X19                 X20
    1  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
    2  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
    3  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
    4  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
    5  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
    6  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
    7  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
    8  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
    9  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
    10 5.2999999999999998 20.600000000000001 38.5 43.299999999999997
    

    【讨论】:

    • 感谢您的帮助!但是出现了一条错误消息:非字符参数。那是因为它是一个数据框吗?我该如何解决这个问题?
    • 我已经更新了处理数据框的答案
    • 这非常有用。我是 R 新手,并且正在学习。因为我的数据集有几千行具有不同的值,所以我需要创建一个 for 循环(或函数)来对所有行重复此步骤?
    • 你试过了吗?这个函数应该适用于您的数据框,假设每行中的长字符串可以分成相同数量的片段(这些片段中的内容无关紧要,为了方便起见,我只是复制了第一行)。您为什么不接受这个答案,然后 ask another question 包含您的数据框样本(使用 dput(head(mydata)) 获取样本以粘贴到您的问题中)。您的问题与您的实际用例越接近,答案就越相关。
    猜你喜欢
    • 1970-01-01
    • 2017-06-29
    • 2017-03-28
    • 2013-12-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多