在 python 中通过 subprocess.call 使用 Sed 进行文件替换答案

【问题标题】：Using Sed through subprocess.call in python to conduct in file replacements在 python 中通过 subprocess.call 使用 Sed 进行文件替换
【发布时间】：2014-06-10 01:24:25
【问题描述】：

我想用另一个文件中的一列替换一个文件中的一列。我正在尝试使用 sed 在 python 中执行此操作，但我不确定我是否正确执行此操作。也许代码会让事情更清楚：

 20 for line in infile1.readlines()[1:]:
 21         element = re.split("\t", line)
 22         IID.append(element[1])
 23         FID.append(element[0])
 24 
 25 os.chdir(binary_dir)
 26 
 27 for files in os.walk(binary_dir):
 28         for file in files:
 29                 for name in file:
 30                         if name.endswith(".fam"):
 31                                 infile2 = open(name, 'r+')
 32 
 33 for line in infile2.readlines():
 34         parts = re.split(" ", line)
 35         Part1.append(parts[0])
 36         Part2.append(parts[1])
 37 
 38 for i in range(len(Part2)):
 39         if Part2[i] in IID:
 40                 regex = '"s/\.*' + Part2[i] + '/' + Part1[i] + ' ' + Part2[i] + '/"' + ' ' + phenotype 
 41                 print regex
 42                 subprocess.call(["sed", "-i.orig", regex], shell=True)

这就是打印正则表达式的作用。系统在 sed 过程中似乎挂起，因为它会在那里停留很长一段时间而没有做任何事情。

"s/\.*131006/201335658-01 131006/" /Users/user1/Desktop/phenotypes2

感谢您的帮助，如果您需要进一步说明，请告诉我！

【问题讨论】：

如果您拥有 Python 的全部功能，您确定要使用 sed 吗？
说实话，我看不出如何在 Python 中做到这一点。我使用 Sed 的原因是我正在处理的表型文件有 9 列，我只想替换第一个而不覆盖整个文件。
试试Python Regular Expressions - docs.python.org/2/library/re.html。例如re.sub() 方法...你需要一个例子吗？
@AndreiBoyanov 一个例子将不胜感激。感谢您指出 re.sub()！
不相关：使用for line in infile2: 而不是for line in infile2.readlines():

标签： python bash sed subprocess

【解决方案1】：

如果您有 Python 和 re 模块，则不需要 sed。这是一个如何使用re 替换字符串中给定模式的示例。

>>> import re
>>> line = "abc def ghi"
>>> new_line = re.sub("abc", "123", line)
>>> new_line
'123 def ghi'
>>>

当然，这只是在 Python 中实现此目的的一种方法。我觉得你str.replace() 也可以完成这项工作。

【讨论】：

【解决方案2】：

第一个问题是shell=True，它与列表参数一起使用。要么删除shell=True，要么使用字符串参数（完整的shell命令）：

from subprocess import check_call

check_call(["sed", "-i.orig", regex])

否则参数（'-i.orig' 和 regex）将传递给 /bin/sh 而不是 sed。

第二个问题是您没有提供输入文件，因此sed 需要来自标准输入的数据，这就是它似乎挂起的原因。

如果您想就地更改文件，可以使用fileinput module:

#!/usr/bin/env python
import fileinput

files = ['/Users/user1/Desktop/phenotypes2'] # if it is None it behaves like sed
for line in fileinput.input(files, backup='.orig', inplace=True):
    print re.sub(r'\.*131006', '201335658-01 13100', line),

fileinput.input() 将标准输出重定向到当前文件，即print 更改文件。

逗号设置sys.stdout.softspace 以避免重复的换行符。

【讨论】：