【问题标题】:Remove lines are between two line numbers (or patterns) with sed [no duplicate]使用 sed [无重复] 删除两个行号(或模式)之间的行
【发布时间】:2018-03-07 08:59:56
【问题描述】:

我看到并阅读了很多关于我的问题的主题,但它们对我没有帮助。

一个与我的问题密切相关的话题,但对我没有帮助: removing lines between two patterns (not inclusive) with sed

问题:我有一个文本文件,我想删除两个模式之间的行。

note1:在这些模式之间,我不想删除具有特定字符串作为键模式的行。

pattern-1 可以是行号(line-2 >> 这永远是静态的)或像 Speed 这样的词(这个词也是永远静态的) .

pattern-2 可以是行号(line-X >> 这不是静态的(它是动态的))或像 Station MAC 这样的词(如果您的解决方案基于词,幸好这是静态的((Station MAC)))。

如果您的解决方案基于行号,我编写了一个 AWK 命令来获取行号以用于第二个模式:

awk '/Station MAC/ {print NR}'  david.txt

note2:根据note1,sed或其他工具不能删除包含我的key-pattern的行。




示例:保持行本身包含单词Maxsms

这里,Maxsms 是键模式。

输入: https://pastebin.com/cztQgm9m

BSSID, First time seen, Last time seen, channel, Speed, Power, # beacons, # IV, LAN IP, ID-length,
84:C9:B2:A6:0B:28, 18:51:36, 18:54:40,  7,  54, PA2,        2,   0.  0.  0.  0,   6, Maryam,
00:1E:E3:EB:2F:4E, 18:50:55, 18:54:36,  1,  54, W.  0.  0.  0,   8, Broadcom,
1C:BD:B9:79:91:C3, 18:50:17, 18:54:13, 11,  54, WP     0,   0.  0.  0.  0,   4, Home,
6C:AD:EF:1F:77:1F, 18:52:15, 18:54:17,  5,  54,    TP,SK,    6,        0,   0.  0.  0.  0,  12, MobinNet771F,
10:C6:1F:E9:90:6E, 18:50:36, 18:54:17,  6,  54,     7,        4,   0.  0.  0.  0,   9, ITIS_9162,
B0:48:7A:CF:BA:12, 18:52:09, 18:53:41,  7,  54,  TP,SK,     3,        0,   0.  0.  0.  0,   3, sms,
6C:19:8F:65:42:CB, 18:53:15, 18:53:15,  1,  54, , -62,        1,        0,   0.  0.  0.  0,  11, Rahmanzadeh,
.....
..skipped..
..skipped..
..skipped..
..
...
......
..skipped..
..skipped..
..skipped..
....
28:10:7B:93:BB:2E, 18:53:15, 18:53:15,  1,  -1,      0,        1,   0.  0.  0.  0,   0, ,
70:79:90:41:62:50, 18:50:17, 18:55:00,  4,  54, A, CP TP,SK, -19,      8,      9,   0.  0.  0.  0,  12, WiFi-Max-MTN,
EC:08:6B:6F:DF:C4, 18:52:52, 18:52:52,  6,  54, WP 2a, MP,SK, -66,        1,        0,   0.  0.  0.  0,   8, senator2,
6E:AD:EF:B4:CB:B6, 18:52:14, 18:52:14,  9,  54, A2, MP,PSK, -70,        1,        0,   0.  0.  0.  0,   6, Mohsen,
A8:F7:E0:06:1F:28, 18:52:44, 18:52:44,  9,  54, P,PSK, -70,        0,        0,   0.  0.  0.  0,  12, Borsa_Donne+,

Station MAC, First time seen, Last time seen, Power, # packets, BSSID, Probed
04:C2:3E:FC:1E:BB, 18:53:00, 18:53:00,  -1,        1, 3C:1E:04:8F:12:83,
F0:79:60:9E:13:4E, 18:52:56, 18:52:56,  -1,        1, 10:C6:1F:E9:90:6E,
40:E2:30:D9:E8:4B, 18:50:53, 18:52:25, -60,        2, F4:F2:6D:DA:27:2F,
D0:65:CA:BD:93:EC, 18:52:12, 18:52:12,  -1,        1, B0:55:08:18:FC:0A,
B8:57:D8:46:86:D4, 18:51:58, 18:51:58, -74,        1, F8:D1:11:C5:0F:72,
28:5A:EB:87:CD:BA, 18:50:28, 18:51:20, -54,       12, 00:23:B1:7C:75:48,
E0:C7:67:88:19:0E, 18:51:08, 18:51:08,  -1,        7, 98:42:46:08:58:F4,

所需输出: https://pastebin.com/gSv74mcZ

BSSID, First time seen, Last time seen, channel, Speed, Power, # beacons, # IV, LAN IP, ID-length,
B0:48:7A:CF:BA:12, 18:52:09, 18:53:41,  7,  54,  TP,SK,     3,        0,   0.  0.  0.  0,   3, sms,
70:79:90:41:62:50, 18:50:17, 18:55:00,  4,  54, A, CP TP,SK, -19,      8,      9,   0.  0.  0.  0,  12, WiFi-Max-MTN,

Station MAC, First time seen, Last time seen, Power, # packets, BSSID, Probed
04:C2:3E:FC:1E:BB, 18:53:00, 18:53:00,  -1,        1, 3C:1E:04:8F:12:83,
F0:79:60:9E:13:4E, 18:52:56, 18:52:56,  -1,        1, 10:C6:1F:E9:90:6E,
40:E2:30:D9:E8:4B, 18:50:53, 18:52:25, -60,        2, F4:F2:6D:DA:27:2F,
D0:65:CA:BD:93:EC, 18:52:12, 18:52:12,  -1,        1, B0:55:08:18:FC:0A,
B8:57:D8:46:86:D4, 18:51:58, 18:51:58, -74,        1, F8:D1:11:C5:0F:72,
28:5A:EB:87:CD:BA, 18:50:28, 18:51:20, -54,       12, 00:23:B1:7C:75:48,
E0:C7:67:88:19:0E, 18:51:08, 18:51:08,  -1,        7, 98:42:46:08:58:F4,

【问题讨论】:

  • 请在您的帖子中发布示例预期结果和示例输入。
  • @RavinderSingh13 我在 PasteBin 上作为直接文本链接上传。
  • 请不要链接,请在帖子本身发布示例。,
  • @RavinderSingh13 我更新了。
  • 请在您的帖子中更好地解释问题和要求。

标签: python linux bash awk sed


【解决方案1】:

你可以试试这个 sed

不完美,但是win7上的busybox!

sed '/Speed/,/^$/{!d;/sms\|Max\|Speed\|^$/!d}' infile

【讨论】:

    【解决方案2】:

    由于 Python 允许在循环中进行显式处理,因此很容易构建一个函数来过滤文件对象。它可能不是最理想的,但易于编写、阅读和维护。

    可能是:

    def filter(fdin, fdout, pat1, pat2, *keys):
        """
    Remove lines between a line containing pat1 and a line containing pat2,
    but also keep lines that would contain any string from keys
    fdin:  input file object
    fdout: output file object
    pat1:  gives the beginning of removed lines (kept in output)
    pat2:  gives the end of removed lines (also kept in output)
    keys:  a number of key-patterns - if a line contains one, it is not removed
    """
        def keypresent(line):    # internal function to test for key patterns
            for k in keys:
                if line.find(k) != -1:
                    return True
            return False
        keep = True              # will be False after pat1 and before pat2
        # finished = False         # will be True after pat2
        for line in fdin:
            if keep or keypresent(line):
                fdout.write(line)         
                # if finished: continue
            if line.find(pat1) != -1:
                keep = False
            elif not keep and (line.find(pat2) != -1):
                fdout.write(line)
                keep = True
                # finished = True
    

    如果在 pat2 之后还有另一行包含 pat1,该函数将再次开始删除行。如果不需要,只需取消注释关于 finished 的 3 行即可。

    可以这样使用:

    with open(inputfilename) as fdin, open(outputfilename, 'w') as fdout:
        filter(fdin, fdout, "Speed", "Station MAC", "Max", "sms")
    

    只是它不会在"Station MAC" 之前保留空行,但修复起来很简单......

    【讨论】:

    • 亲爱的,请您帮忙。我可以编写一个 python 脚本来解决问题/问题。但我需要一个基于 C 的工具。因为文件的大小很大,而且有很多行。不过 tnx 对你有很大帮助。我分享,你分享 = 我们学习。再次Tnx。 [AT]ctac_ 解决方案基准非常棒。