【问题标题】:Stata: importing .txt with inconsistent delimitersStata:导入带有不一致分隔符的 .txt
【发布时间】:2021-05-15 16:31:39
【问题描述】:

我有一个带有相对奇怪的分隔符的 .txt 文件。数据看起来像这样:

|ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,|     |,|FOURTH QUARTER REPORT|,||
|ABC5|,|Name2, extraname|,|NameRaw2|,,|XY2|,266539.0,|pac  |,|MID-YEAR REPORT|,||
|ABC6|,|Name3|,|NameRaw3|,|y|,|X,Y3|,60000.0,|name |,|YEAR-END REPORT|,|XYZ|

因此存在一些变量没有任何管道的问题,例如这里的第六个变量,这只是一个没有管道的数量,并且某些变量只有在它们为空时才没有管道,就像这里的第四个变量 ,,,|y|,。有些变量也有逗号,所以我不能用逗号作为分隔符。所以基本上有两个问题:

  1. 分隔符是逗号,但逗号也会显示在字符串值中
  2. 有些变量在管道内,有些不在,有些只有在它们不为空时才存在

我正在寻找一种在 Stata 中解决此问题的方法。有人知道怎么做吗?

【问题讨论】:

    标签: import stata delimiter txt


    【解决方案1】:

    如果完整的数据集比这个例子更混乱,我真的不想知道。但这似乎有点道理。

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str100 whatever
    "|ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,|     |,|FOURTH QUARTER REPORT|,||"
    "|ABC5|,|Name2, extraname|,|NameRaw2|,,|XY2|,266539.0,|pac  |,|MID-YEAR REPORT|,||"
    "|ABC6|,|Name3|,|NameRaw3|,|y|,|X,Y3|,60000.0,|name |,|YEAR-END REPORT|,|XYZ|"
    end
    
    gen work = whatever
    replace work = subinstr(work, ",,", ",||,", .)
    
    forval j = 1/5 {
        gen work`j' = substr(work, 1, strpos(work, "|,") + 1)
        replace work = subinstr(work, work`j', "", 1)
    }
    
    gen work6 = substr(work, 1, strpos(work, ","))
    replace work = subinstr(work, work6, "", 1)
    
    forval j = 7/8 {
        gen work`j' = substr(work, 1, strpos(work, "|,") + 1)
        replace work = subinstr(work, work`j', "", 1)
    }
    
    gen work9 = work  
    drop work 
    
    forval j = 1/9 { 
        replace work`j' = trim(subinstr(work`j', "|", "", .)) 
        replace work`j' = substr(work`j', 1, length(work`j') - 1) if substr(work`j', -1, 1) == ","
    }
    
    list 
    
        +-----------------------------------------------------------------------------------+
      1. |                                                                          whatever |
         |    |ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,|     |,|FOURTH QUARTER REPORT|,|| |
         |-----------------------------------------------------------------------------------|
         | work1  |            work2  |    work3  |  work4  |  work5  |     work6  |  work7  |
         |  ABC4  |            Name1  | NameRaw1  |      y  |    XY1  |   10000.0  |         |
         |-----------------------------------------------------------------------------------|
         |                              work8              |              work9              |
         |              FOURTH QUARTER REPORT              |                                 |
         +-----------------------------------------------------------------------------------+
    
         +-----------------------------------------------------------------------------------+
      2. |                                                                          whatever |
         | |ABC5|,|Name2, extraname|,|NameRaw2|,,|XY2|,266539.0,|pac  |,|MID-YEAR REPORT|,|| |
         |-----------------------------------------------------------------------------------|
         | work1  |            work2  |    work3  |  work4  |  work5  |     work6  |  work7  |
         |  ABC5  | Name2, extraname  | NameRaw2  |         |    XY2  |  266539.0  |  pac    |
         |-----------------------------------------------------------------------------------|
         |                              work8              |              work9              |
         |                    MID-YEAR REPORT              |                                 |
         +-----------------------------------------------------------------------------------+
    
         +-----------------------------------------------------------------------------------+
      3. |                                                                          whatever |
         |      |ABC6|,|Name3|,|NameRaw3|,|y|,|X,Y3|,60000.0,|name |,|YEAR-END REPORT|,|XYZ| |
         |-----------------------------------------------------------------------------------|
         | work1  |            work2  |    work3  |  work4  |  work5  |     work6  |  work7  |
         |  ABC6  |            Name3  | NameRaw3  |      y  |   X,Y3  |   60000.0  |  name   |
         |-----------------------------------------------------------------------------------|
         |                              work8              |              work9              |
         |                    YEAR-END REPORT              |                XYZ              |
         +-----------------------------------------------------------------------------------+
    

    【讨论】:

      猜你喜欢
      • 2021-05-15
      • 1970-01-01
      • 1970-01-01
      • 2013-01-31
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多