【问题标题】:Split a text file if two consecutive lines almost identical如果两个连续的行几乎相同,则拆分文本文件
【发布时间】:2023-07-22 12:35:02
【问题描述】:

我需要根据上一行的字符串内容(从位置 2 到 13)和当前行的字符串内容(从位置 2 到13)...

我解释一下:

我的文件是这样的:

IA1234567890A         XX33              AZE
bla1                  XX34              DES
bla2                  XX34              DES
bla3                  XX34              DES
FA1234567890A         XX35              AZE
IA1234567890A         XX36              AZE
bla4                  XX34              DES
bla5                  XX34              DES
bla6                  XX34              DES
FA1234567890A         XX37              AZE
IB0987654321A         XX38              AZE
bla7                  XX34              DES
bla8                  XX34              DES
bla9                  XX34              DES
FB0987654321A         XX39              AZE

当以“I”开头的一行的前 12 个字符(不考虑“I”)与前一行的前 12 个字符(始终以 a 开头)不同时,我想拆分文件“F”除了第一行,但比较时不应该考虑“F”)。

所以我不会在这两行之间拆分文件:

FA1234567890A         XX35              AZE
IA1234567890A         XX36              AZE

但我会在这两行之间拆分文件:

FA1234567890A         XX37              AZE
IB0987654321A         XX38              AZE

我知道如何使用分隔符分割文件,但我完全迷失了这个比较的东西......

如果你们能帮我解决这个棘手的案例,我将不胜感激......

谢谢!

【问题讨论】:

    标签: windows batch-file command-line text-files


    【解决方案1】:

    这从data.txt 读取并创建output1.txtoutput2.txt、...outputn.txt

    @echo off
    setlocal enabledelayedexpansion
    
    set outputcount=0
    set previousblock=
    
    for /f "delims=" %%s in (data.txt) do (
      set line=%%s
      set currentblock=!line:~1,13!
    
      if "!line:~0,1!" EQU "I" (
        if "!previousblock!" NEQ "!currentblock!" (
            set /A outputcount=!outputcount!+1
        )
      )
    
      echo !line!>>output!outputcount!.txt
      set previousblock=!currentblock!    
    )
    

    例如

    D:\scripts>splitfile.bat
    D:\scripts>type output*
    
    output1.txt
    
    
    IA1234567890A         XX33              AZE
    bla1                  XX34              DES
    bla2                  XX34              DES
    bla3                  XX34              DES
    FA1234567890A         XX35              AZE
    IA1234567890A         XX36              AZE
    bla4                  XX34              DES
    bla5                  XX34              DES
    bla6                  XX34              DES
    FA1234567890A         XX37              AZE
    
    output2.txt
    
    
    IB0987654321A         XX38              AZE
    bla7                  XX34              DES
    bla8                  XX34              DES
    bla9                  XX34              DES
    FB0987654321A         XX39              AZE
    

    编辑

    更新了代码以使其正常工作。

    【讨论】:

    • 太棒了。他可以改变什么以在 Output2.txt 不中断,因为他不想中断
    • 哈哈,谢谢你——我完全专注于让它运行,我没有注意到它做错了。我已经编辑了代码以(希望)更正它。
    • 谢谢阿伦!没想到这么快的解决方案!!!但是您的解决方案将我的数据拆分为 3 个输出文件......就像它只考虑一行开头的“I”来拆分之前的文件一样。我想知道是否不应该有 =!line:~1,12!在 set previousblock= ?
    • @CHRISTIAN 我想我已经解决了这个问题,现在它只输出两个文件。我做的子字符串测试有点糟糕,包括比较中的第一个字符;第一个 setpreviousblock= 应该将其初始化为空,主要是为了整洁。
    【解决方案2】:

    如果输入文件很大,这个方法应该运行得更快,因为它不会检查所有的行。它还可以正确处理带有特殊批处理字符的行。

    @echo off
    setlocal EnableDelayedExpansion
    
    rem Read the first line, and create a dummy previous "endLine" with same name
    set /P "endName=" < test.txt
    set "endName=F%endName:~1%"
    set startLine=1
    set "startName="
    
    rem Redirect the input file to a code block, in order to read it
    < test.txt (
    
       rem Locate all lines that start with "I" or "F"
       for /F "tokens=1,2 delims=: " %%a in ('findstr /N /B "I F" test.txt') do (
          if not defined startName (
             set "startName=%%b"
             if "!startName:~1,12!" neq "!endName:~1,12!" (
                rem New section starts: copy it to its own file
                set /A lines=endLine-startLine+1
                (for /L %%i in (1,1,!lines!) do (
                   set /P "line="
                   echo !line!
                )) > "Part !endName:~1,12!.txt"
                set "endName=F%startName:~1%"
                set "startLine=%%a"
             )
          ) else (
             set "endLine=%%a"
             set "endName=%%b"
             set "startName="
          )
       )
    
       rem Copy last section to its own file
       findstr "^" > "Part !endName:~1,12!.txt"
    )
    

    输出:

    C:\> type Part*.txt
    
    Part A1234567890A.txt
    
    
    IA1234567890A         XX33              AZE
    bla1                  XX34              DES
    bla2                  XX34              DES
    bla3                  XX34              DES
    FA1234567890A         XX35              AZE
    IA1234567890A         XX36              AZE
    bla4                  XX34              DES
    bla5                  XX34              DES
    bla6                  XX34              DES
    FA1234567890A         XX37              AZE
    
    Part B0987654321A.txt
    
    
    IB0987654321A         XX38              AZE
    bla7                  XX34              DES
    bla8                  XX34              DES
    bla9                  XX34              DES
    FB0987654321A         XX39              AZE
    

    【讨论】:

      【解决方案3】:

      试试这个:

      #!/bin/sh
      
      ## clean any split files (got created in previous runs)
      rm split.*;
      
      ## define variables, ct=counter for reading next line, cnt=counter for creating split.X file and file=split filename
      ct=2
      cnt=1
      file="split.$cnt";
      
      ## Read line with spaces, IFS=''
      IFS=''
      while read lineP
      do
        ## Read next line and increment ct variable
        lineN="$(sed -n "${ct}p" inputfile.txt)" && ((ct++))
      
        ## Read first character of two lines and the next 12 characters
        lineP121=${lineP:0:1} && lineN121=${lineN:0:1}
        lineP1212=${lineP:1:12} && lineN1212=${lineN:1:12}
      
        ## Match / Condition
        if [[ "$lineP1212" != "$lineN1212" && ( "$lineP121" == "F" && "$lineN121" == "I" ) ]];
        then
         echo "${lineP}:" >> $file;
         ((++cnt));
         file="split.$cnt";
        else
         echo -e "$lineP\n" >> $file;
        fi
      done < inputfile.txt
      
      echo -e "\n\nFile created are (with contents in split.X files):\n\n"
      ls -l split.* && echo && grep -n . split.* && echo
      

      输出是:创建的文件数 2 split.1 和 split.2 文件(根据输入文件)。

      File created are (with contents in split.X files. Output generated by grep -n command. You can use simple cat command if you want):
      
      
      -rw-r--r-- 1 koba loki 450 Jun  3 19:01 split.1
      -rw-r--r-- 1 koba loki 225 Jun  3 19:01 split.2
      
      split.1:1:IA1234567890A         XX33              AZE
      split.1:3:bla1                  XX34              DES
      split.1:5:bla2                  XX34              DES
      split.1:7:bla3                  XX34              DES
      split.1:9:FA1234567890A         XX35              AZE
      split.1:11:IA1234567890A         XX36              AZE
      split.1:13:bla4                  XX34              DES
      split.1:15:bla5                  XX34              DES
      split.1:17:bla6                  XX34              DES
      split.1:19:FA1234567890A         XX37              AZE:
      
      split.2:1:IB0987654321A         XX38              AZE
      split.2:3:bla7                  XX34              DES
      split.2:5:bla8                  XX34              DES
      split.2:7:bla9                  XX34              DES
      split.2:9:FB0987654321A         XX39              AZE
      

      【讨论】:

      • 但这不是 Windows 批处理文件...(见问题标签)
      • 看,我错过了。投票给你。
      最近更新 更多