【发布时间】:2026-02-20 08:30:03
【问题描述】:
我对 Python 完全陌生。
我尝试制作模拟 bash 命令:cat domains.txt |sort -u|sed 's/^*.//g' > domains2.txt
文件域包含带有和不带有掩码前缀 *. 的域列表,例如:
*.example.com
example2.org
大约 300k+ 行
我写了这段代码:
infile = "domains.txt"
outfile = "2"
outfile2 = "3"
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
line = line.replace('*.', "")
fout.write(line)
with open('2', 'r') as r, open(outfile2, "w") as fout2 :
for line in sorted(r):
print(line, end='',file=fout2)
它按计划剪切*.,对列表进行排序,但不删除重复的行
我曾建议使用 re.sub 而不是 replace 来使模式更加严格(就像在 sed 中我从行首开始做的那样),但是当我尝试这个时:
import re
infile = "domains.txt"
outfile = "2"
outfile2 = "3"
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
newline = re.sub('^*.', '', line)
fout.write(newline)
with open('2', 'r') as r, open(outfile2, "w") as fout2 :
for line in sorted(r):
print(line, end='',file=fout2)
它只是对错误不起作用,我不明白。
【问题讨论】: